Why does the compiler needs to know the size of types at compile time?

New to languages in general that requires the developer to be aware of memory details, so apologies for the noob questions, but I was wondering why is it useful that the the compiler knows the size in memory a type takes at compile time?

Is this used for some optimization? if so what kinds?

Can one infer the size of the binary produced from the size of the types? That is, will a program using the i128 type produce a bigger binary compared to a program that uses u32?

Or this “knowing of types” only affects the size of the memory the program will use at runtime? If it is only about memory usage at runtime, how? I mean if I have a function with all the types known at compile time, how does this affects memory usage at runtime?

I mean, dynamic languages don’t have the luxury of a compile time to know the types and yet, they work…so what is the difference?

Someone else can probably give a much better answer, but these two concepts might be of interest as a starting point:

1 Like

Almost all operations in a program are expressed in terms of fixed offsets from a given starting point. For example my_array[8] translates to pointer_of_array + (size_of_element * 8) in machine code.

Dynamic languages achieve dynamicism through an abstraction that is still based on this same baseline principle.

In Python, for example, all python types are implemented on a single struct type called PyObject. So when in Zig you would do:

const numbers: [3]usize = .{1,2,3};

In python this is what happens:

# You write:
numbers = [1,2,3]

# The interpreter behind the scenes does (using pseudo-Zig syntax)
var numbers: []*PyObject = .{ 
   &PyObject{ .number = 1 },
   &PyObject{ .number = 2 },
   &PyObject{ .number = 3 },
};

And if you created a mixed list in Python, such as [1, "foo", false], this is what the interpreter does:

var numbers: []*PyObject = .{ 
   &PyObject{ .number = 1 },
   &PyObject{ .string = "foo" },
   &PyObject{ .boolean = false },
};

Here I’m using syntax that suggests that PyObject is a union, which is functionally correct, although in reality it’s probably a bit more complicated because of optimizations.

So, no, in fact dynamic languages tend to have only 1 actual type that they use to bootstrap dynamic typing as a higher level abstraction.

8 Likes

Just wanted to add that some statically typed languages have built-in types that behave like that too - https://www.freepascal.org/docs-html/ref/refsu18.html

That is, will a program using the i128 type produce a bigger binary compared to a program that uses u32 ?

Generally yes, because most computers do not have 128-bit operations supported in the hardware. In order to add two 128-bit numbers, you have to emulate it with operations you do have on your machine, and typically that would be 64-bit additions. (For the lawyers out there, this paragraph is referring to MIMD, not SIMD)

However, binary size isn’t the main reason, it’s more about control, which gets you guaranteed speed and memory characteristics. Say you add two integers in python together or you use BigInts in JavaScript. How long will it take? How much memory will it take? Well, it depends on the inputs. On the other hand, you can add two i128’s together and look at how it compiles on your target architecture. Let’s say it takes 4 instructions that each take one cycle to execute. There you go, 4 cycles is how long it will take every time. How much memory does it take? 128 bits, by definition.

You could also use BigInts in Zig, or you could use 8, 16, 32, or 64 bit integers, or anything you want that may not be directly supported in hardware but can be emulated. In Zig, you have the option to use any arbitrary bitwidth you want. In a language like JS on the other hand, you get f64’s, i32’s, and bigints. If you know how your optimizer works under the hood, you might have some other specific integer options too but that’s another story.

All of the control that Zig gives you adds up. I’ve done Leetcode problems in a scripting language before and there are plenty of cases where you have to work around what is allowed by the language whereas if you were allowed to do whatever you want, you could just do what you’re trying to do. That’s one of the main features of languages in the category of C-replacement languages.

3 Likes