Unified memory models and AddressSpace

Sze · October 14, 2024, 2:16am

I think if std.builtin.Type.Pointer.address_space is different they would always be different memory areas? (Unless they map to the same memory, but then it seems like a matter of knowing what gets mapped where in memory and that seems os/hardware dependent?)

chung-leong · October 14, 2024, 4:06pm

When a segment register isn’t being used, it’s typically set to zero. Pointers in its memory space would point to the same things as generic pointers:

const std = @import("std");

pub fn GSPointer(comptime T: type) type {
    return @Type(.{
        .Pointer = .{
            .size = .One,
            .is_const = true,
            .is_volatile = false,
            .alignment = @alignOf(T),
            .address_space = .gs,
            .child = T,
            .is_allowzero = true,
            .sentinel = null,
        },
    });
}

var number: i32 = 1234;

pub fn main() void {
    const ptr1: *i32 = &number;
    const ptr2: GSPointer(i32) = @ptrFromInt(@intFromPtr(ptr1));
    std.debug.print("{d}\n", .{ptr2.*});
}

Sze · October 14, 2024, 4:13pm

I think I don’t really know how .address_space works, does somebody have references that explain details about it?

chung-leong · October 14, 2024, 5:59pm

x86 segment registers are legacy items from the bad-old days of 16-bit programming. We stopped using them like 30 years ago except for obscure things like thread local storage.

I’m curious about the purposes of the other address spaces.

dimdin · October 14, 2024, 7:28pm

It is for different address spaces such as GPU memory, micro-controller flash, etc.
The possible values are listed in: std.builtin.AddressSpace

chung-leong · October 14, 2024, 7:47pm

I haven’t paid much attention to GPU development. You mean we still haven’t gotten to the point of a unified memory model?

dimdin · October 14, 2024, 8:12pm

Having a unified memory model is not some kind of panacea. Each model has pros and cons.
CPU can have different address space for instructions and data, it is called Harvard architecture. The Von Neumann architecture shares the program and data memory.
For GPU there is the model where both the GPU and CPU can use the same pointer to access memory and another model where CPU and GPU have distinct address spaces.

chung-leong · October 14, 2024, 11:19pm

Well, the pro is that a unified memory model is easier for humans to understand. The con is that such a model imposes considerable technical challenges, regarding memory cohesion, for example. The only way to balance out the needs of humans and machines is for us to get a hundred percent of what we want. We aren’t capable of getting smarter, after all.

pierrelgol · October 15, 2024, 7:35am

So I would greatly recommand you read the book called Operating Systems : Three easy pieces and link to the Address space chapter. This book walks you through the How ? and Why ? the Operating system is working the way it is.
I’m currently reading this book in preparation for writing my own Kernel, and it’s a fantastic ressource.

So the TLDR is that your code, when it gets loaded, will live in virtual memory, giving your program the illusion that it has the entire memory at it’s disposal, virtualization was introduced as a mean to improve the safety/reliability of computers (because the address space gives information to the OS on what is the range of pages/segment/memory that your process can access).

Address space, is the user level abstraction that allows that, your program contains multiple segments/page (depend on the system sometimes it’s hybrid), (text, code, heap, stack) depending on the implementation they might have their own address space, their own page table, etc.

When you try to load an address, the virtual address of your address space, has to go through the CPU MMU (Memory Management Unit). The MMU will break down that address into it’s two components, the Page number, and the Page offset. Then it will check the TLB (Translation Lookside Buffer) to see if the translation is already available, if not, it will produce a page fault, giving back control to the OS, which will try to find the page in memory.

If it doesn’t it will look on disk in the swap space, if it doesn’t find it, your program gets terminated, if it does find it, it will verify that this page of memory can be accessed by your process, and then it will load the actual value.

There are more details that I might have missed, and the inner working might depend on the actual implementation within the OS you are using (some system used only segmentation, some system only use pages, some system use a hybrid system, some system use multi level page table, some system will put the code section at the base of the heap, some wont, it really depends), but I think it generally hold true.