Is there a performance overhead to operating on big slices?

Let’s say I can establish an upper bound for the memory my program consumes, but that bound is much larger than the actual amount of memory I expect any non-degenerate input to the program to consume.

An example of this could be something like a tokenizer/lexer - we can have at most one token per character in the input, so an upper bound is easily established, but it is extremely unlikely that any given input will consist of only single-character tokens.

Obviously pre-allocation is a good pattern for performance, but is there a performance overhead to pre-allocating the upper bound and passing around a slice of that (probably extremely large) buffer, compared to a situation where I could magically allocate exactly the right amount of memory at the start of the program?

Remember that a slice is always the same size in memory (A pointer + a Length). So the question here is really about the overhead of allocating a large chunk of memory at once.

Of course there will be some overhead due to allocating more memory than is really needed, but if you are unable to determine exactly how much you need then that overhead is a bit of a sunk cost. Your alternative is to dynamically allocate as needed but then you run into overheads of multiple system calls, and possible reallocation(s).

4 Likes

If you are effectively doing parsing or lexing, using on demand parsing could reduce your memory consumption.
Another option is to use real-world data to make an educated guesstimate. Bad input data in your lexing example is likely to come from an attacker/hacker trying to break the program than legitimate users anyway.

1 Like

This is where a good middle ground is preferable. You can establish a reasonable upper bound (Zig Ast estimates 1 token to be around 8 bytes link). This allows for an up front allocation for the normal case.

For pathelogical cases, the overhead of reallocating may be worth it. If someone is doing something dumb with the program, they can pay for it with more syscalls and reallocations. You do want to be careful that this doesn’t impact normal users in things like webservers though.

So trying to a completely foolproof up-front allocation may not be what you want. Kinda depends on what you are trying to build and what your tolerations are. If you are TigerBeetle, you probably want to not run out of memory after startup, so it makes sense. But if you are a compiler, you want your happy path to be fast and not worry so much about the extreme cases.

3 Likes

This feels like a case where virtual memory could help: reserve the upper bound or “worst case estimate” in the page table but only commit memory as needed, starting with a reasonable estimated size. A side benefit is your token list can grow in place up to the reserved size.

You’d need to be careful when targeting 32 bit operating systems/targets though.

5 Likes

In addition to what @glfmn said, if you need way more virtual memory than available physical memory, you have to think about the size of the page tables themself.

The needed size of them is about virtual_memory / page_size * page_table_entry_size.

For example for mapping the entire 32-bit virtual address space with 4KiB pages and 4 byte page table entries you would need 2^32 / 2^12 * 2^2 = 2^22 = 4MiB just for the page tables.

If you scale this to something like 1PiB (with 8 byte entries) 2^50 / 2^12 * 2^3 = 2^41 = 2TiB. Then you would like to think about using large pages.

1 Like

I think this pretty succinctly answers my question, so I’m marking it as a solution.

I found it pretty interesting that Zig ‘just’ goes with the average, I think I’d probably be more likely to use the tail end of the distribution. I guess that requires a lot more work than size = bytes/tokens.

Ah yea, that’s a good perspective I suppose. Do I really care about performance in the pathological case where people are using my screwdriver to hammer in nails or cut down trees?

This feels like a very clever solution, in the good sense. Keep it continuous in memory until the boundary condition, but only ever commit the amount you need.

Thank you all for the replies. I’ve got food for thought. :sweat_smile:

2 Likes

The simple answer is that small slice and large slice have the exact same overhead since a slice itself is just a pair of [ptr, len], as pointed out by Calder-Ty.

The other answer is that allocating large extra memory do have impact on performance. One case is where larger items are more likely to force others out of the L1/L2 caches.

Most modern languages support dynamic array with expanding capacity. Zig has the ArrayList. Last I checked a 2X capacity expanding policy has an amortized cost of slightly above O(1) on expansion, i.e. the cost of the copying on expansion when amortized over the overall appending operations is close to O(1).

Zig is growing at ‘super-linearly’ which is less than 2X but should have a similar growing cost.

I won’t worry about striking for max performance by pre-allocating large buffer. Just use ArrayList or similar data structures to grow the buffer dynamically. Do benchmarks later to pinpoint the areas of real poor performance.

Edit: ArrayList let you set initial capacity. That’s the chance to do a rough pre-allocation. It’s the best of two worlds.

2 Likes