Common sub-expression elimination (CSE) in Zig

joelreymont · January 29, 2025, 5:39am

I’d like to implement common sub-expression elimination (CSE) for a compiler (transpiler, really) that I wrote in Zig.

The key is detecting that an expression has been used before. Other languages use sorted maps that can be keyed on expressions but Zig only seems to have hash maps.

Do I build a hash function for my expression node and use a hash map?

How does the Zig compiler do it? Please point me to the source if you can.

Thank you, Joel

AndrewKraevskii · January 29, 2025, 10:27am

By sorted maps do you mean search tree? If so zig std has std.Treap (wiki).

pierrelgol · January 29, 2025, 10:32am

I think that this kind of optimization, is easier to be done when you have an ir that uses sea of nodes or ssa. then it becomes trivial to detect, that all path from a*b leads to c.

A Simple Graph-Based Intermediate Representation

joelreymont · January 29, 2025, 10:39am

Does the Zig compiler do it or is it left to LLVM?

Where in the Zig compiler code is this being done?

pierrelgol · January 29, 2025, 1:11pm

That I don’t know at all, I’m by no mean a compiler expert, I’m very interested in the domain, and I’ve read a lot about it, but I don’t know how the Zig compiler works at this level of granularity, My best guess would be that there is probably some form of CSE in Zir or Air. And there is definitely some in LLVM.

But SSA, I think is probably the simplest approach if you want to implement CSE. The reason is that SSA create a unique single use variable for each operation. Because variables are guaranteed by the format to be uniquely defined. You can easily determine, that if two points within the program, share the same variable, they therefore share the same value.

There is also the Phi functions. At each control-flow merges, the phi functions allow you to merge values, from different branches, without redefining the same variable.

So with all that for CSE all you need to do is to have a CFG (control flow graph), and determine if two sub-expressions, depend of the same unique set of definitions.

Static Single Assignment Form and the Control Dependence Graph

joelreymont · January 29, 2025, 1:25pm

The issue I’m facing is detecting function calls with the same arguments.

These may be assigned to different variables as there’s no requirement for it to be done differently. I would like to call these functions only once so I need to keep a tree keyed on f(a, b) and similar.

pierrelgol · January 29, 2025, 1:39pm

The problem here is that you first need to determine the purity of the function, to safely elide calls to that function. at least that’s my understanding of it, if you treat function call as expressions, where each arguments are unique_versioned variants. than it shouldn’t be too hard to see that the expression node for that function is the same in other places. I’m not knowledgeable enough to say how to practically implement it, but at a high level this is how I think it is implemented.

LucasSantos91 · January 29, 2025, 1:39pm

I believe the standard approach is to use SSA and then apply local value numbering. This works for any subexpression, including function calls.