Disclaimer: these symptoms could 100% be explained by a bug in my code.
I have an HTTP server built with http.zig and using zqlite (coincidentally, by the same author) β great libraries.
I have written tests for this server, stealing^H^H^H using code from both these libraries. The tests nicely exercise all the endpoints of the server, and check the HTTP responses. For some of the endpoints, the server inserts data in an SQLite database; for others, it reads the data back. Pretty standard stuff. The database is exclusive to this server β not shared with any other system.
My problem is, the tests are not reliable. Sometimes they fully pass, sometimes they die with a SIGABRT. Here is one excerpt of a failure:
info(daemon): Testing new device, label [Raspberry Pi] => 201
================================================================================
panic running "Daemon: auth user and device"
================================================================================
thread 69892 panic: index out of bounds: index 12, len 0
.../src/testing.zig:149:52: 0x11a7610 in parseHTTPResponse (test)
status = try std.fmt.parseInt(u16, line[9..12], 10);
^
.../src/server.zig:481:48: 0x123a2a8 in test.Daemon: auth user and device (test)
var res_device = try mini.parseResponse();
^
.../test_runner.zig:80:30: 0x122cdb8 in main (test)
const result = t.func();
^
../sysdeps/nptl/libc_start_call_main.h:59:16: 0x7fbd03027740 in __libc_start_call_main (../sysdeps/x86/libc-start.c)
../csu/libc-start.c:360:3: 0x7fbd03027878 in __libc_start_main_impl (../sysdeps/x86/libc-start.c)
???:?:?: 0x104ce24 in ??? (???)
test
ββ run test failure
error: process terminated with signal ABRT
failed command: ./.zig-cache/o/3fdea6cca786ed054f0ea9a7d3e4c59e/test
Build Summary: 6/8 steps succeeded (1 failed)
test transitive failure
ββ run test failure
error: the following build command failed with exit code 1:
.zig-cache/o/60ecfa57eb775377e7628cea33af855f/build mise/installs/zig/0.16.0/zig mise/installs/zig/0.16.0/lib ... .zig-cache /home/gonzo/.cache/zig --seed 0x6bba6048 -Zf7c4cb5141f72813 -Doptimize=ReleaseSafe test
I am about to embark in a binary search through the code, stubbing out some parts and seeing if the tests become 100% reliable; I plan to start with the SQLite calls. But before I start down this long road, I was wondering if these symptoms ring a bell to anybody.
Maybe unrelated, anecdotal information:
- I am building this code with
-Doptimize=ReleaseSafebecase the native backend gives meunhandled relocation type R_X86_64_PC64. All of the code is built like this β I have removed all the caches and recompiled many times. - The
sqliteC code is vendored in a directory in my project, and being compiled frombuild.zig. - I have run the tests in a loop (with 5s pauses in between). I have seen tens / hundreds of runs with no issue, and then a
SIGABRT. With this loop running, I closed my laptop lid, moved to another floor in the house and reopened; the tests started failing, repeatedly.
Thanks in advance for any wisdom shared. Cheers!