Interested in Implementation of Big Integers (u256, u1024, u3141, etc.)

I wanted to write a regex crossword solver for fun in Zig, and so I’ve been doing all sorts of wacky stuff with like using a u256 to represent the set of candidates for a particular character.

So, I was curious about how this is actually implemented. For example, if I wrote:

const std = @import("std");

pub noinline fn foobarbaz() void {
    const x: u1024 = 1 << 1021;
    const y: u1024 = 1 << 1022;
    const z = x + y;
    std.debug.print("{}\n", .{z});
}

pub fn main() void {
    foobarbaz();
}

What would the assembly look like? I am not an assembly pro, but I would guess that we would “somehow allocate 1024 bits on the stack for x,y,z” and “somehow add them” but the “somehow add them” is the part I want to actually try and understand!

So with some help from the Zig discord I built my file like this:

zig build-obj main.zig -fstrip -O ReleaseSmall -femit-asm=main.a

and when I look at the assembly I see

main.foobarbaz:
	.cfi_startproc
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset rbp, -16
	mov	rbp, rsp
	.cfi_def_cfa_register rbp
	push	r15
	push	r14
	push	r13
	push	r12
	push	rbx
	sub	rsp, 2024
	.cfi_offset rbx, -56
	.cfi_offset r12, -48
	.cfi_offset r13, -40
	.cfi_offset r14, -32
	.cfi_offset r15, -24
	lea	rbx, [rbp - 960]
	lea	rsi, [rbp - 1024]
	push	64
	pop	rdx
	mov	rdi, rbx
	call	debug.lockStderr
	mov	r15, qword ptr [rbx]
	movabs	rax, 6917529027641081856
	mov	r10d, 1025
	xor	edi, edi
	mov	qword ptr [rbp - 144], 0
	mov	qword ptr [rbp - 200], 0
	mov	qword ptr [rbp - 176], 0
    ...omitting the rest

now, I look at this and the only part I recognize is push rbp and move rbp, rsp :stuck_out_tongue:
I don’t know if staring at this assembly and picking it apart is going to be useful, so I wanted to ask:

Actually, I had an idea about the 6917529027641081856.

Python 3.12.3 (main, Jan 22 2026, 20:57:42) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(f"{6917529027641081856:b}")
110000000000000000000000000000000000000000000000000000000000000
>>> 

So perhaps it’s somehow taking that huge number and then trying to repeatedly multiply it by 2 until it gets $2^1021 + 2^1022$?

Because x y and z are all statically known constants I would expect that the addition happens as a part of the compilation process and that the resulting assembly just encodes the result in some way that llvm finds convenient.

For the addition to happen at runtime I think you need to have runtime inputs/variables.

1 Like

You’re absolutely right! (for all intents and purposes this is a joke please don’t ban me)

Anyways, changing the program to this:

const std = @import("std");

pub noinline fn foobarbaz(x: u1024, y: u1024) u1024 {
    return x + y;
}

pub fn main(process: std.process.Init) !void {
    var stdin_buffer: [1024]u8 = undefined;
    var stdin_reader = std.Io.File.stdin().reader(process.io, &stdin_buffer);

    var in = try stdin_reader.interface.takeDelimiterExclusive('\n');
    stdin_reader.interface.toss(1);
    const x = try std.fmt.parseInt(u1024, in, 10);
    in = try stdin_reader.interface.takeDelimiterExclusive('\n');
    stdin_reader.interface.toss(1);
    const y = try std.fmt.parseInt(u1024, in, 10);

    std.debug.print("{}\n", .{foobarbaz(x, y)});
}

(reading from stdin ripped straight from Right way to read user input from stdin in zig 0.15.2 )
and looking at the assembly I see:

main.foobarbaz:
	.cfi_startproc
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset rbp, -16
	mov	rbp, rsp
	.cfi_def_cfa_register rbp
	add	rsi, qword ptr [rbp + 104]
	mov	rax, rdi
	adc	rdx, qword ptr [rbp + 112]
	adc	rcx, qword ptr [rbp + 120]
	adc	r8, qword ptr [rbp + 128]
	mov	qword ptr [rdi + 16], rcx
	mov	rcx, qword ptr [rbp + 144]
	mov	qword ptr [rdi + 8], rdx
	mov	qword ptr [rdi], rsi
	mov	rsi, qword ptr [rbp + 152]
	mov	rdx, qword ptr [rbp + 160]
	adc	r9, qword ptr [rbp + 136]
	mov	qword ptr [rdi + 24], r8
	adc	rcx, qword ptr [rbp + 16]
	mov	qword ptr [rdi + 32], r9
	adc	rsi, qword ptr [rbp + 24]
	mov	qword ptr [rdi + 40], rcx
	adc	rdx, qword ptr [rbp + 32]
	mov	qword ptr [rdi + 48], rsi
	mov	rsi, qword ptr [rbp + 168]
	adc	rsi, qword ptr [rbp + 40]
	mov	qword ptr [rdi + 56], rdx
	mov	rdx, qword ptr [rbp + 176]
	adc	rdx, qword ptr [rbp + 48]
	mov	qword ptr [rdi + 64], rsi
	mov	rsi, qword ptr [rbp + 184]
	adc	rsi, qword ptr [rbp + 56]
	mov	qword ptr [rdi + 72], rdx
	mov	rdx, qword ptr [rbp + 192]
	adc	rdx, qword ptr [rbp + 64]
	mov	qword ptr [rdi + 80], rsi
	mov	rsi, qword ptr [rbp + 200]
	adc	rsi, qword ptr [rbp + 72]
	mov	qword ptr [rdi + 88], rdx
	mov	rdx, qword ptr [rbp + 208]
	adc	rdx, qword ptr [rbp + 80]
	mov	qword ptr [rdi + 96], rsi
	mov	rsi, qword ptr [rbp + 216]
	adc	rsi, qword ptr [rbp + 88]
	mov	qword ptr [rdi + 104], rdx
	mov	rdx, qword ptr [rbp + 224]
	adc	rdx, qword ptr [rbp + 96]
	mov	qword ptr [rdi + 112], rsi
	mov	qword ptr [rdi + 120], rdx
	pop	rbp
	.cfi_def_cfa rsp, 8
	ret

Which looks more approachable. I think I should go sleep now but I’ll try to understand what this does tomorrow. Thanks!