Strange crashes with http.zig and zqlite

Disclaimer: these symptoms could 100% be explained by a bug in my code.

I have an HTTP server built with http.zig and using zqlite (coincidentally, by the same author) – great libraries.

I have written tests for this server, stealing^H^H^H using code from both these libraries. The tests nicely exercise all the endpoints of the server, and check the HTTP responses. For some of the endpoints, the server inserts data in an SQLite database; for others, it reads the data back. Pretty standard stuff. The database is exclusive to this server – not shared with any other system.

My problem is, the tests are not reliable. Sometimes they fully pass, sometimes they die with a SIGABRT. Here is one excerpt of a failure:

info(daemon): Testing new device, label [Raspberry Pi] => 201
================================================================================
panic running "Daemon: auth user and device"
================================================================================
thread 69892 panic: index out of bounds: index 12, len 0
.../src/testing.zig:149:52: 0x11a7610 in parseHTTPResponse (test)
            status = try std.fmt.parseInt(u16, line[9..12], 10);
                                                   ^
.../src/server.zig:481:48: 0x123a2a8 in test.Daemon: auth user and device (test)
        var res_device = try mini.parseResponse();
                                               ^
.../test_runner.zig:80:30: 0x122cdb8 in main (test)
        const result = t.func();
                             ^
../sysdeps/nptl/libc_start_call_main.h:59:16: 0x7fbd03027740 in __libc_start_call_main (../sysdeps/x86/libc-start.c)
../csu/libc-start.c:360:3: 0x7fbd03027878 in __libc_start_main_impl (../sysdeps/x86/libc-start.c)
???:?:?: 0x104ce24 in ??? (???)
test
└─ run test failure
error: process terminated with signal ABRT
failed command: ./.zig-cache/o/3fdea6cca786ed054f0ea9a7d3e4c59e/test

Build Summary: 6/8 steps succeeded (1 failed)
test transitive failure
└─ run test failure

error: the following build command failed with exit code 1:
.zig-cache/o/60ecfa57eb775377e7628cea33af855f/build mise/installs/zig/0.16.0/zig mise/installs/zig/0.16.0/lib ... .zig-cache /home/gonzo/.cache/zig --seed 0x6bba6048 -Zf7c4cb5141f72813 -Doptimize=ReleaseSafe test

I am about to embark in a binary search through the code, stubbing out some parts and seeing if the tests become 100% reliable; I plan to start with the SQLite calls. But before I start down this long road, I was wondering if these symptoms ring a bell to anybody.

Maybe unrelated, anecdotal information:

  • I am building this code with -Doptimize=ReleaseSafe becase the native backend gives me unhandled relocation type R_X86_64_PC64. All of the code is built like this – I have removed all the caches and recompiled many times.
  • The sqlite C code is vendored in a directory in my project, and being compiled from build.zig.
  • I have run the tests in a loop (with 5s pauses in between). I have seen tens / hundreds of runs with no issue, and then a SIGABRT. With this loop running, I closed my laptop lid, moved to another floor in the house and reopened; the tests started failing, repeatedly.

Thanks in advance for any wisdom shared. Cheers!

1 Like

I think this piece of code is too optimistic about slicing the HTTP response. If TCP closes early, causing the complete HTTP response not to be received from the socket, this test will crash.
I think it is necessary to defensively check before slicing.

2 Likes

Yes, that is correct. I already added a check in there for index out of bounds. But the reason it fails there, to begin with, is that code that tried to run – before that – raised an ABRT signal.

2 Likes

I created an issue on the http.zig repo, and through the magic of rubber ducking the problem, I found the cause. It was basically a timeout that was too aggressive / a retry policy that was too humble.

Given recent discussions, I am happy to report that this whole thing was diagnosed using pure old-style human brains – no LLMs where allowed anywhere near the code. Which gives me the perfect place to use a quotation from The Matrix that has been in my head for a while now:

Holes? Nope. Me and my brother Dozer, we’re both 100% pure, old-fashioned, home-grown human, born free right here in the real world.

Cheers!

7 Likes