Bugs Rust Won’t Catch - Zig Strings

I found this blog Bugs Rust Won't Catch | corrode Rust Consulting extremely interesting when taking into account the choice to not have a String type in Zig. Many of the bugs present in Rust’s uutils had to do with using String and it’s default UTF-8 encoding for paths, over using the more correct array of u8.

Strings are usually a big pain point with most Zig users and only after reading this article did I have a better understanding of why the choice to not have Strings in the standard library was made

6 Likes

Regarding the ‘comm’ CVE, perhaps std.fs.path.fmtAsUtf8Lossy should be nuked as to not invite the same kind of path formatting problem. If someone insists, they can always reach into std.unicode, but at least then it’s not in the fs.path namespace.

1 Like

I’d argue that’d be learning the wrong lesson. IMO the lesson is really “you need to think things through when it comes to paths”; making any blind choice can and likely will lead to problems.

My full thoughts can be found here:

This bit is particularly relevant:

2 Likes

Interesting blog post. I particularly like the acknowledgement of panic being a DDOS vector. Anybody remember the Cloudflare outage?

What’s also interesting is the discussion about bugs not shipped. It’s the standard fare of buffer overruns and null pointer dereferencing which Zig is also going to be pretty strong on.

That means, even if the tools were (and probably still are) buggy, they never had a bug that could be exploited to read arbitrary memory.

Ok, but in commands like pwd, ls, split, and od what information were you expecting an attacker to extract through a buffer overrun exploit? They don’t have access to anything except what you fed into them.

What’s left is, frankly, a more interesting class of bug. It lives at the boundary between our controlled Rust environment and the messy, chaotic outside world, where paths, bytes, strings, and syscalls are all tangled up in one eternal ball of sadness. That’s the new security boundary of modern systems code.

That was always the security boundary, which is what the chroot bug in the Rust re-write shows.

2 Likes

Maybe, but since as you state there’s no canonical way to format arbitrary paths as utf8, having this one way in std still makes it easy to reach for when one perhaps shouldn’t. Perhaps clearer doc strings about potential issues is a compromise.

FWIW, my personal lessons here is that good API design can’t prevent people from writing bad code. API-design-wise, Rust is great! It is very pedantic about what are strings, what are bytes, and what are OS-dependent paths. And the API SCREAMS at you if you use the wrong thing.

Like

// ra, rb are &[u8], raw bytes from the input files.
print!("{}", String::from_utf8_lossy(ra));
print!("{delim}{}", String::from_utf8_lossy(rb));

Who would’ve write that?! The only way this could be more obviously wrong is if your terminal gave an audible beep if you’d type that, and computer rebooted angrily. https://www.cve.org/CVERecord?id=CVE-2026-35375 is the same level of obviousness of brokenness.

But, yeah, a lot of people do routinely write code like that. And people who don’t, arguably, don’t benefit that much from extra API type-safety.

I would still say that, on the margin, appropriate API salt & sugar can help, but that is surprisingly less effective than one would think.

It seems plausible that Zig’s transparent treatment of paths as bytes makes Zig programs more correct on Unix, but I am not sure if that just doesn’t exchange some bugs for others.

Eg, if I run zig fetch non-ascii-bytes, it sends an HTTP request with invalid URL (so, violating the protocol, as far as I understand), instead of rejecting this early. Similarly, I’d be surprised if there are no subtle bugs with Zig’s making wft8 on windows visible to the users.

7 Likes

The first Don’t Trust a Path Across Two Syscalls chapter is very important and not being talked about enough, I think.

We should be teaching people in schools to never use a path twice if they wish to access the same file. Always prefer a file descriptor if you’re doing more than a single operation. In fact, working with paths in general should be avoided as much as possible if a different facility can do the job, because there are many edge cases and it’s very error prone, not to mention sensitive to OS changes. It’s generally just better to leave it to the OS to do it once, and then operate on the object that the OS gives you.

In that same vein, I really dislike the Resolve Paths Before Comparing Them rule from the blog. No. Instead of comparing paths, they should be stating them and comparing inodes. User space resolution of paths in general is just not worth the hassle.

I’m honestly a bit baffled that someone would set out to replace coreutils without this knowledge, which I feel is kind of crucial for this sort of endeavour.

EDIT: For these reasons I really take issue with the part of the post that says the codebase was “written by people who knew what they were doing”. I’m sorry, but no, it really wasn’t.

8 Likes

…and yet they shipped it as default in 25.10 (before this review), and are still pushing it to be the only version available in the future. :person_facepalming: IMHO Ubuntu needs to revert back to GNU coreutils and sudo.

I don’t understand why the Rust community has such damaging blind spots.

1 Like

Note that “Rust community” != “The set of people who historically worked on implementing rust coreutils” != “The set of people that decided to ship rust coreutils with Ubuntu”.

More generally, there isn’t such a thing as “programming language community”, and there’s especially no such thing as a “programming language community that can actually solve coordination problem and act cohesively with intention”.

2 Likes

I just fail to see the rationale to push for uutils so much, other than “hurr, durr, C unsafe bad, Rust safe good”[1]. IF there were frequent bugs stemming from memory safety issues in coreutils, I would kinda sorta understand, but for a project of its age they seem to be rather few and far between. I realize I’m probably preaching to the choir here, but am I missing something?


  1. But please do note that I have nothing in particular against Rust, I just don’t really work with it at the present time ↩︎

I can see some:

For having uutils:

  • It’s MIT rather than GNU
  • It’s usable as a library (by, for example, nushell)

For carefully replacing coreutils in a distro with a Rust version:

  • unsafe bad :slight_smile: While older code generally has fewer vulns, this might also be a consequence of us being bad at discovering vulnerabilities? Given the recent deluge of AI-discovered weaknesses everywhere, I think this take aged way better than anyone could have imagined. I mean, given Copy Fail: 732 Bytes to Root on Every Major Linux Distribution. - Xint, I would only be moderately surprised to learn that you can pwn GNU cat with tricky arguments.
  • I don’t think coreutils are completely frozen? They do get new features, and, with some amount of expected future development, it might make sense to go to Rust

For replacing coreutils in a distro with an obviously half-baked version:

  • Gets massively sketchier from here :stuck_out_tongue: But I can see how just throwing the code out there and making it everyone’s problem massively accelerates development speed, assuming we do want to migrate eventually. Yes, there’ll be a rocky start, but there’s going to be only finite amount of bugs, quick fixed. Especially given that Rust is so much more easier to hack on than older GNU projects.