Inferred error sets

Why does zig fail to infer error sets when 2 functions loop on each other?

I am not familiar with compiler internals, but my guess would be that it collects all errors from a function and then goes into all the functions this one tries and gets from those which would make it loop infinitely. But wouldn’t keeping track from which functions the errors were already collected or keeping track of the function call stack fix this?

I think it could do this theoretically, but from my observation Zig tries to keep things that are inferred at a shallow level of complexity and I think this makes sense, because if you only infer things in a shallow way you already get most of the benefit of inference, without the downsides of very deep and increasingly more expensive inference.

(Some of the downsides: more difficulty to reason and understand locally, hidden complexity through long reasoning/causal-chains, increased complexity in implementation, worse performance (repeated work / more contention / less ability to trivially parallelize if more functions need to be considered together))

Getting the compile error forces the programmer to insert a cut in the inference graph somewhere, which then turns it into something which is simpler and can be inferred by the simple algorithm, which is likely faster and more easy to implement and understand.

That said I think it could be nice if the compiler could emit the compile error and then keep going, do the deep inference and give a hint to the user, maybe giving them a few candidate functions where they could declare an explicit error set to break the loop apart (and also stating which error set would be in that error set). I think that could increase the user experience, when you could get such suggestions maybe as notes on the error.

I think getting an error, still is a good thing, because it means you can’t build huge spaghetti monsters with crazy complex inferred error-sets and super complex call graphs, it makes it so that you are eventually forced to add some explicit error sets and I think that is a good thing, otherwise you would end up with huge codebases where you spend hours trying to figure out what functions can return what errors, or alternatively your tooling wastes a lot of compute re-analyzing and figuring out what the possible error-sets are.

If Zig was less textual-code based and more towards structured editing, I could imagine an argument that these inferred error sets could just be cached along the code and the tooling wouldn’t have to recompute it, but I also think it makes sense to just write some error sets down explicitly in the code, instead of supporting inference that works so deeply that you start to think about how you can cache it, instead of repeatedly having to compute it.

Or said another way, at some point you really should be thinking about your error sets and what is allowed where and from my experience mutually recursive functions are a good point to start thinking about your errorsets and about where would be the best and most appropriate points to explicitly declare an errorset, it also makes it a whole lot easier to read these kinds of functions, being able to directly read at least some of their errorsets.

If all things would be infer-able you would be able to write programs which just end up like a mess that can return anyerror (because the whole program becomes one spaghetti ball), so it also is a good point of friction to incentivize you into organizing your program into regions, with some borders that have explicit errorsets which makes it easy to understand and use.

5 Likes