For context, I’ve been working on quantizing floating point numbers to 8-bit for math operations (think matrix multiplies for example). I’ve found a few projects that I’m considering rewriting in Zig. Here’s one that I’ve been reading:
That got me to thinking about getting some more resources together to get more into this subject in general though. So, I’m curious - does anyone have any resources/projects they particularly like towards this subject? I’m happy to hear any explanations and experiences people have as well.
Awesome - and I’ll always make time to listen to a Mike Acton talk (haven’t read much of his writing but I can imagine it’s probably good).
My impression so far is that there isn’t something particularly special about floating point quantization in ML… but they have certain techniques that may not make much sense in an outside context to get scaling values. For instance, it’s common compute a small sample of the operations first and find min-max values based on the average for each tensor and just use those instead of requantizing each time during evaluation/training.
I’m probably going to do an example project in Zig to get some practical experience. If you (or others) are interested in that I’ll make a public repo and post it here anyone wants to write some code.