Resources on (very) low level network programming

kj4tmp · April 27, 2025, 1:21am

I need some learning resources on implementing networking programming. Things like how TCP/IP stacks are implemented.

I’m not talking about how to use my OS’s TCP/IP stack, I’m talking about how the OS TCP/IP stack is written.

For example, resources that could help me understand the ramifications of a choice between the following ways I could represent and ethernet frame:

pub const EthernetFrame = struct {
    header: Header,
    data: []const u8,
    n_padding: u8,
};

pub const EthernetFrame = struct {
    header: Header,
    data: []u8,
    n_padding: u8,
};

pub const EthernetFrame = struct {
    header: Header,
    data: []u8,
};

pub const EthernetFrame = struct {
    header: Header,
    data: std.BoundedArray(u8, 1500),
};

vulpesx · April 27, 2025, 8:03am

It would probably help to know what standard you are implementing, otherwise I can only find conflicting information.

With my (very limited knowledge) I’d do the second last struct, but with a const slice. The amount of padding can be derived from the data length and the header size

matklad · April 27, 2025, 9:42am

I’d start with perusing source code of

Zambyte · April 27, 2025, 3:24pm

There is no point in adding padding to struct types, because struct does not have a stable binary representation. The compiler is free to rearrange (maybe even omit?) fields to try to optimize the layout. Using this kind of struct is useful as a high level representation of the protocol, but the consequence of this is that you will have to convert and write each field individually (perhaps using comptime, but considering the stability and small number of fields, it’s probably better to just write each manually).

If you want a type that you can just dump as bytes, you should consider a packed struct instead. However, packed structs can only contain fields that themselves have a stable binary representation, which does not include slices, arrays, or structs (non-packed, non-extern). It does include @Vector, so you can use that for fixed length buffers.

I personally think the best path is to just use a regular struct, and provide your own toBytes and fromBytes that will convert the header and data slice to the proper binary representation. You can see how I have done something similar here (also deals with endiness of the message payload):

and for comparison, I also have a C struct that I use to expose the same API (don’t take this as gospel, just an initial stab at it):

kj4tmp · April 28, 2025, 12:27am

The question is mostly about what the overall system design is. How many buffers, when to use linked lists etc. How zero copy is accomplished, etc.

Looking for book recommendations or other in-depth resources.

TwoClocks · April 28, 2025, 1:10am

Steve’s “UNIX Network Programing” Is a good place to start. Not only goes over the details of on-write data layouts (and how to represent them in struct). But also covers quite a bit of a standard stack implementation. So is “TCP/IP Illustrated”. Gets more into switches and routing, but mandatory for any low-level network programmer.

There are a lot of edge cases in network programing… even for users. The various stacks behavior can very quite a bit. Although most of them started as a fork from the BSD stack. There are user-mode version you can run on top of DPDK, etc. It’s the grandmother stack… maybe dig around in there?

I’d also look at all the linux stack tuning stuff in /proc/sys/net. That’ll give you insight into how the kernel uses memory for it’s stack impl.

TCP back-presure/flow-control are a bit of a black art. There are lots of papers on the topic. Linux allows a few different options. Reno and Vegas are good places to start.

Zero Copy kinda requires you to abandon standard BSD type socket APIs. recv needs to return a pointer, not take a destination buffer to be truly zero copy. For this reason zero copy network APIs are pretty bespoke and normally are specific to a hardware nic. AMD’s SolarFlare, and Nvidia’s Connect-X card both support kernel by-pass w/ user mode stack implementations (OpenOnload and libVMA respectively). They also both have zero-copy network APIs. Onload’s tcpDirect, and vma’s socketExtream. There are likely others but these are the two big ones I’m aware of. Both are open source and on github.

Io_Uring on linux has some zero-copy stuff, but it’s only in fairly recent versions, and only on writes (I think). It’s probably the closest thing to a standard ZC API ATM.

You can use DPDK and eBPF to do zero copy from the nic itself. But either have tcp/ip implementation built in… so you’d have to write you own networking stack yourself. Or cobble one together. Honestly, if your trying to learn this stuff writing your own stack is a good way to do it. There are some verification suites out there to test against. There is a TON of old stuff in the spec that nobody uses any more… like TCP priority data.

I’m not sure any of this is answering your question though? I feel like maybe your looking for something else?

buzmeg · April 28, 2025, 2:20am

For big machines, I would dig into the FreeBSD TCP/IP code. FreeBSD has been noted historically for having better networking code than Linux (don’t know if that still applies).

Also, aren’t there some userspace TCP/IP stacks? I seem to remember Linux kernel bypass being all the rage for a while. They me be convoluted in order to bend themselves around getting every last drop of networking performance at the expense of everything else.

However, you are going small, it seems like pretty much only lwIP and FreeRTOS-TCP (courtesy of Amazon). Maybe uIP. However, at that level it all kinda sucks.

“UNIX Network Programming” covers the TCP/IP state machine, but it also leaves out a lot.

For your own sanity, when you write your communication stack, the low level functions need to have state in, state out, bytes in, bytes out, and time as an input (trust me–time as an input is super, Super, SUPER important for making a communication stack deterministic).

You want to be able to completely deterministically create test cases so that you can “run” your stack solely in software without involving null or passthrough or hardware devices.

slonik-az · April 29, 2025, 1:25am

The author is W Richard Stevens. “Unix Network Programming” book comes in two volumes (volume 1 and volume 2) and is the great book on networking. He also wrote “TCP/IP illustrated” and “Advanced Programming in the UNIX Environment” – my favorite.
The guy was a legend. Unfortunately he passed away in 1999 at the age of 48. For more info see his wikipedia page W. Richard Stevens - Wikipedia