Posix.sendto causes "panic: reached unreachable code" on Windows

FObersteiner · July 31, 2024, 2:35pm

Hi all. In the last weeks, I’ve done very little Zig, and the one thing I wanted to approach fails on Windows. That kind of bugs me ^^

So… I recently tried to cleanup main() of my NTP query tool. I wanted to move the actual call to the NTP server to a separate function. That works fine on Linux. Now I wanted to use the tool on Windows. It all worked fine in version 0.0.16, where I had all the code crammed into one big main function. With the change, compilation works fine. But calling the executable fails:

>>> D:\Software\zig-windows-x86_64-0.14.0-dev\zig build -Dexe
>>> .\zig-out\bin\ntp_client.exe -s 192.168.0.1

thread 9336 panic: reached unreachable code
D:\Software\zig-windows-x86_64-0.14.0-dev\lib\std\posix.zig:6029:31: 0xf9d91a in sendto (ntp_client.exe.obj)
                .WSAEFAULT => unreachable, // The lpBuffers, lpTo, lpOverlapped, lpNumberOfBytesSent, or lpCompletionRoutine parameters are not part of the user address space, or the lpTo parameter is too small.
                              ^
D:\Code\Zig\ntp_client\src\main.zig:141:25: 0xf9ca72 in sample_ntp (ntp_client.exe.obj)
    _ = try posix.sendto(
                        ^
D:\Code\Zig\ntp_client\src\main.zig:101:50: 0xfa3316 in main (ntp_client.exe.obj)
            const result: ntp.Result = sample_ntp(&sock, &dst, &buf, proto_vers) catch |err| switch (err) {
                                                 ^
D:\Software\zig-windows-x86_64-0.14.0-dev\lib\std\start.zig:540:75: 0xfa7eaa in main (ntp_client.exe.obj)
    return callMainWithArgs(@as(usize, @intCast(c_argc)), @as([*][*:0]u8, @ptrCast(c_argv)), envp);
                                                                          ^
D:\Software\zig-windows-x86_64-0.14.0-dev\lib\libc\mingw\crt\crtexe.c:267:0: 0x1045a90 in __tmainCRTStartup (crt2.obj)
    mainret = _tmain (argc, argv, envp);

D:\Software\zig-windows-x86_64-0.14.0-dev\lib\libc\mingw\crt\crtexe.c:188:0: 0x1045aeb in mainCRTStartup (crt2.obj)
  ret = __tmainCRTStartup ();

???:?:?: 0x7ffbd1ba7373 in ??? (KERNEL32.DLL)
???:?:?: 0x7ffbd311cc90 in ??? (ntdll.dll)

Can anybody give me a hint what’s going on here? Why is this working fine if I have the sendto / recvfrom code in main, but not if I put it into a separate function? Thanks in advance.

mnemnion · July 31, 2024, 3:31pm

Sadly, I know next to nothing about Windows and can’t assist here.

I’m here to say what I said last time, and what I’ll say next time: this is standard library code hitting an unreachable branch at run time. That isn’t something which should happen, and it will keep happening to std.posix, because there is no way to assure by construction that a host system won’t send a particular magic number. These branches aren’t unreachable at all, look! Behold, code which reached it.

This would cause undefined behavior in release mode. I’ll stop kvetching about this when there’s an accepted issue tracking how to fix it.

dimdin · July 31, 2024, 4:14pm

Two suggestions:

Add a print statement before sendto that prints the value of dst_addr_len.
Try to run it while running DebugView
Run dbgview64.exe as Administrator and then start your application.
DebugView might display the failure reason.

FObersteiner · July 31, 2024, 5:06pm

Thanks, I’ll give this a try tomorrow!

FObersteiner · July 31, 2024, 5:09pm

You think I should raise an issue on github? So that the reached unreachables can be tracked better?
I think I’ll need to get a better understanding of what’s going on first though…

mnemnion · July 31, 2024, 6:12pm

I think both of those are good ideas: figure out what’s going on with your code, if you can, because the reason that unreachable is triggered may well be that something incorrect on your end is happening, in the process of moving it to its own function.

But “runtime code reached unreachable in the standard library” is itself a bug, so you either have “this code should be correct but isn’t” or you have “this is buggy code but it also triggers an unreachable branch”, and once you figure out which is which, I encourage you to add it to the tracker. There are a few variations already, but yours is likely to be new, because the nature of the problem is that there are a great many ‘unreachable’ branches in std.posix. If you do find an identical issue, adding a comment to it is appropriate as well.

I know the core team needs to prioritize their time, my hope with all this is that a basic design which doesn’t have the problem can get worked out, and moved into the “accepted” “contributor friendly” stage of the process. I would gladly dedicate a few hours of some weekend to fixing this up, but it’s not entirely clear to me what should be done.

squeek502 · July 31, 2024, 7:30pm

Zig currently marks errors as unreachable if it thinks they are only reachable due to “programmer error.” I tend to agree with you and I’ve said before that I think Zig will want to/need to move to less and less unreachable in the standard library over time, but in this case it’s a distraction. The OPs code wouldn’t suddenly work if the unreachable was replaced with return error.Something.

squeek502 · July 31, 2024, 8:41pm

I hit the same unreachable if I move the code back into main, so it’s not related to the function call.

Stepping through with a debugger, and the parameters to posix.sendto look identical between 0.0.16 and main, so I’m unsure what’s going on here.

squeek502 · July 31, 2024, 9:07pm

Found the issue. It’s this:

src_ip: []const u8 = "0::0",

In 0.0.16, that was:

src_ip: []const u8 = "0.0.0.0", // TODO : should this be 0::0 / IPv6 by default?

If I change that back to 0.0.0.0 in the main branch, posix.sendto succeeds.

Unsure exactly why that’s the case (not very familiar with these networking APIs), but that should give you something to go on.

dimdin · July 31, 2024, 9:35pm

:: is an IPv6 shortcut, in "0::0" means: fill everything between with zeros.

squeek502 · July 31, 2024, 9:40pm

Right, I’m aware of that. I’m unsure why a socket created with the IPv6 address is causing WSASendTo to fail with WSAEFAULT, whereas a socket created with an equivalent IPv4 address works fine.

dimdin · July 31, 2024, 9:48pm

The socket bind happens to an IPv6 address (the 0::0 that is our source address).
Then it tries to sendto an IPv4 address, the destination address is expected to be IPv6 (bigger) but the buffer size is only for an IPv4 address (smaller).

FObersteiner · August 1, 2024, 6:31am

That’s it! First of all, thanks again for finding the issue . I was naive in believing that sending from a v6 socket would work for a v4 target address, as it does on Linux. Changing more than one thing at a time tends to go wrong if things are brittle to start with, I just gave myself another example… I have to admit, I forgot about that change (default to IPv6).

While I might have learned something here, this is not the first time I reached an unreachable. Despite that fact that a user of the std lib (me) wrote code that reached an incorrect state (which should not be reached), it feels like there is room for improvement in terms of user-friendliness. Zig has a very good error model IMHO. What is the actual error here? Tell it to the user, this is what you did wrong. Now they have a chance to fix the root cause, or catch it (as I do for ‘attempt to send to IPv6 from IPv4’ btw.). Is this the direction where the Zig std lib wants to evolve? I mean, in principle it’s better to prevent user being stupid in the first place, but there are probably countless cases where you cannot achieve this realistically.

squeek502 · August 1, 2024, 7:45am

I’m in favor of less (tending towards zero) unreachable in the standard library for handling system error codes, but this was actually pretty clear in your original stacktrace:

.WSAEFAULT => unreachable, // The lpBuffers, lpTo, lpOverlapped, lpNumberOfBytesSent, or lpCompletionRoutine parameters are not part of the user address space, or the lpTo parameter is too small.

That’s the error code (WSAEFAULT) and a comment explaining why it’s marked unreachable (it’s a quote from the WSASendTo docs). If a regular Zig error was returned in this case, you’d get (at best) the same amount of information, since all Zig has to work with is a WSAEFAULT error code.

FObersteiner · August 1, 2024, 8:38am

yeah it’s not an easy task I guess. My understanding now is that unreachable is a place holder, since it is well-reachable, the user just needs to be naive enough The fact that it’s crashing the program is OK in a sense that you have something to debug in any case, no matter if an error is returned or not. Still, I wonder if there’s something to make the experience smoother? If on the other hand the API to the OS only yields a certain amount of “usable” info, I don’t see a way to improve on that. Just to check a lot of possible misconfiguration upfront, to return an error with a higher information content.