Doing heap things in build.zig

I never really got into the possibilities of build.zig. I really should do that soon.
I know we can do lots of things in there.

So question…

I am thinking about creating a “tiny” version of my chess engine as a fun side project.
The executable size of an “official tiny engine” is 4096 bytes max.

My “big” chess engine has a lot of compile time computed constants (lots of arrays).
That is blowing up the executable size of course.

So the plan is to stuff all these constants in a struct on the heap.
Is it possible to fill this heap-struct inside build.zig, thus avoiding runtime code. And somehow let this code execute at compiletime?
(Runtime code of course also consumes executable space).

Any other crazy tips are also welcome of course.

you can add values, but no clue how far one can get.

for instance, we put a version tag from git to build as

    // add build options to runtime
    const options = b.addOptions();
    exe.root_module.addOptions("build", options);

    // build: version
    const args = &[_][]const u8{ "sh", "-c", "git describe --exact-match --tags HEAD 2>/dev/null || echo \"$(git rev-parse --abbrev-ref HEAD)-$(git rev-parse --short HEAD)\"" };
    const version = b.run(args);
    options.addOption([]const u8, "version", version);

and later use it in the app accessing as `@import(“build”).version`.

std itself is available in the build, don’t see a reason why std.heap is not

2 Likes

I don’t think so? since the memory exists on the heap at runtime it must be set at runtime. it can either be set by copying bytes from rodata (i.e. potentially increasing the binary size), or by executing code that writes to that memory (i.e. potentially slowing down the execution of your program).

3 Likes

Are you sure your comptime code is the problem? Generally things should only show up in your executable’s data/rodata sections if runtime code actually references it, so moving your comptime code to build.zig and referencing it with @embedFile or by generating zig code that you @import isn’t likely to change the executable size.

One thing you could do more easily if you compute your arrays via build.zig is to compress them. Then when your program starts it can take the compressed byte arrays from @embedFile and decompress them into heap memory. Technically you could probably achieve the compression with just comptime code too, but it would likely be painfully slow to compile.

That said, 4K is positively tiny! I don’t think even a zig hello world compiled with ReleaseSmall ends up that small on most platforms, so you probably won’t be able to afford a deflate decompressor. You could take a look at simpler compression algorithms though. I came up with one a few years back for use on highly resource-constrained microcontrollers that might work well if the data isn’t super random.

1 Like

That small size makes me think it may make more sense to look into the demo scene (or old games) and what they are using, my guess would be lots of formulas and (deterministic) pseudo randomly generated data mixed with reused data that is reinterpreted in multiple ways and probably also hand crafted assembly, but I haven’t looked into what they actually use. (just heard some cool anecdotes like using the program code as textures)

1 Like

Would some sort of ‘domain-specific’ data compression be possible, e.g. how “noisy” is the data in those tables?

E.g. compute the tables in build.zig apply this ‘magic’ data compression, save to files, pull those files in via @embed (hopefully keeping the exe size small enough because the data is compressed), and then decompress into heap-tables (which hopefully is faster and less code than doing the full computation from scratch).

…or is there a sensible way to split the table computation into an expensive offline-part (which creates little intermediate data baked into the exe via @embed), and then ‘finish’ the computation in a runtime-code part at startup into heap tables?

…but IME: when you have code that generates the data tables, this is usually already the best “compression algorithm”. Usually nothing beats data-generation-code when it comes to “data density”.

Another crazy idea: build your own domain-specific virtual instruction set, write the table computation in this virtual “machine code”, and run the whole table computation at startup via a minimal interpreter (only makes sense if the virtual program + the interpreter is smaller than the native code which computes the table data - and it will probably be a lot slower). But AFAIK this technique of a special VM is used in scene demos which need to keep executable size down, but need to generate a lot of data.

PS: another idea: can the table computation be described by much fewer parameters and a generic algorithm? E.g. only store the parameters in the executable and run the algorithm to populate the heap tables. A little bit how in 3D rendering spherical harmonics might replace a whole cubemap for image-based lighting with just 9(?) parameters.

4 Likes

Some of the data can be embedded and compressed I think.

But not this one… here you can see some of the comptime computed const arrays.

"this is usually already the best “compression algorithm”
In my ideal world it would be great if that @comptime functions could fill a heap allocated struct if you know what i mean…

I will setup a skeleton and check…

Another general idea that may or may not work for you: do the computations at build time, and then generate a zig file that’s compiled normally by the build system and which does the thing you want at runtime (like putting the computed constants on heap)

Another idea would be to go down the rabbithole of self-modifying code and somehow generate the code that generates the tables at runtime(maybe even one recursion more). This, totally based on intuition, seems doable because your table generation code looks quite repetitive.

I just did a ReleaseSmall on a completely empty main() and it was already 5KB.
Either I need to dive into rabbitholes or Zig is not the right language to do this

You can get below that, but you’ll probably need to drop down to syscalls rather than using std.

If your goal is really under 4K you should be looking into going directly to the entry point i.e. defining _start yourself.

Have a look at docs and start.zig

2 Likes

Note that I’m including the std start code as well when I say std.

On my machine, the following compiles to 904 bytes with ReleaseSmall:

const std = @import("std");

pub export fn _start() noreturn {
    std.os.linux.exit(0);
}
1 Like

880 bytes if I use -fomit-frame-pointer, in which case it just compiles down to:

$ objdump -d exit

exit:     file format elf64-x86-64


Disassembly of section .text:

00000000010011a4 <.text>:
 10011a4:	6a 3c                	push   $0x3c
 10011a6:	58                   	pop    %rax
 10011a7:	31 ff                	xor    %edi,%edi
 10011a9:	0f 05                	syscall
1 Like

Aha aha. Ok. Rabbitholes but doable rabbitholes maybe :slight_smile: Thx

1 Like

What number do you get with -OReleaseSmall -fomit-frame-pointer -fno-unwind-tables -fstrip -dead_strip followed by strip -R .comment <binary> ? That used to be effective, especially -fno-unwind-tables

2 Likes

FYI:

zig build-exe main.zig -OReleaseSmall
904
zig build-exe main.zig -OReleaseSmall -fomit-frame-pointer
880
zig build-exe main.zig -OReleaseSmall -fomit-frame-pointer -fno-unwind-tables
600
zig build-exe main.zig -OReleaseSmall -fomit-frame-pointer -fno-unwind-tables -fstrip
600
zig build-exe main.zig -OReleaseSmall -fomit-frame-pointer -fno-unwind-tables -fstrip -dead_strip
600
strip -R .comment main
504

Then we have the following headers:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000001001120  00000120
       0000000000000007  0000000000000000  AX       0     0     4
  [ 2] .shstrtab         STRTAB           0000000000000000  00000127
       0000000000000011  0000000000000000           0     0     1

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000001000040 0x0000000001000040
                 0x00000000000000e0 0x00000000000000e0  R      0x8
  LOAD           0x0000000000000000 0x0000000001000000 0x0000000001000000
                 0x0000000000000120 0x0000000000000120  R      0x1000
  LOAD           0x0000000000000120 0x0000000001001120 0x0000000001001120
                 0x0000000000000007 0x0000000000000007  R E    0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000001000000  RW     0x0

Meaning the headers alone are 480 = 64 + 3*64 + 4*56 bytes. (amd64)

If I’m remembering correctly everything but the ELF header and a single program header could be cut; so the minimum would be 120 = 64 + 56 bytes.

4 Likes

Not too shabby!

3 Likes