Ownership, borrowing, lifetimes, parallel programming and unnecessary complexity

I have a suspicion that lots of systems would be able to shift 95% of allocations to a mix of arenas and the programming-without-pointers style - especially for any tight loops! - and then go ahead and use either GC or RC because why not? It only needs to cover 5% as many pointers and now the performance difference is irrelevant. Caveats apply, statistics are made up examples, it all depends on the system, but… really, why not?

2 Likes

I basically believe the same, because (almost by definition) any non-trivial program has some kind of main loop. From embedded stuff like routers, to GUI applications, or climate simulations that run on 1000s of computers. The only real question is what happens inside that main loop.

Modern allocators also basically try to do what we all? advocate for - put similar sized things together in memory. But they don’t go the next step of putting similarly lived things together in memory and free them together. And that’s because they simply don’t have enough data to decide what dies together.

I think that somehow Rust could, with their lifetime annotations and some AST analysis, give this additional data to an allocator and with that have automatic arenas. But I haven’t seen or heard anything about such an idea. And it would likely also be way to complex, just to have some better allocations. But complexity never deterred them anyway…

1 Like

Yes it’s absolutely remarkable.
But I don’t get how someone sees an untyped scripting language developed together in 10 days and say “this is a good idea for the future of web dev”, and made it the hearth of modern web development and the de facto standard for developing.
In that forest of an answer I wrote there’s a paragraph also about the fact that “web fundamentals” classes teach react. I find this pretty disturbing.

Yes. I think in university we should write C because of that. No matter what you do later on in life: If you read and write C, you can read and write almost everything (non-functional) and it forces you to understand the computer, which is also very important. Also most languages are “sons and daughters” of C, so learn the OG language will make you a better programmer for sure.

Lately, I found myself becoming a fan of Bash for everything that’s scripting. I found out you can actually do pretty complicated stuff with bash.
Python never really clicked with me. It has nice syntax and everything but…
I also enjoy Lua quite a lot, but I’m too lazy to really learn it properly.

SQL is a must for any dev. Non-relational database seem useless to me, at least I can’t think of an actual good use case of non-relational databases.

I see. Isn’t GC also an I/O bound in a sense?
Also in my answers I talked about GC languages probably generalizing a bit too much. I guess most of my issues with Java aren’t with GC (though I find that manual memory-management makes what’s going on clearer to me) and more with syntax/design decisions/path of least resistance towards OOP. (And maybe I’m a bit biased.)

This is extremely interesting.
An arena allocator doesn’t solve this, right? I guess it would depend on child_allocator but given your answer I’d guess that no allocator does that, i.e. there’s no child_allocator to pass around that has that ability. Would it be possible to implement this idea with an arena, since all the memory is freed within the same .deinit() call?

Apologies for mostly just skimming over your wall of text :smiley:

Some things that stood out to me:

…(about general performance requirements for different types of apps)…

I have seen a handful very real-world problems in big C++ game code bases which used “one heap allocation per object” and refcounting for lifetime management, and those were measurable, but it was too late to meaningfully fix the issues without rewriting everything from scratch (not realistic):

  • significat time was spent in the memory allocation/deallocation with thousands of allocations happening per frame - this could partially be worked around by integrating a ‘fast’ allocator like jemalloc, but this is really just plastering over the actual problem
  • related: significant time was spent in ‘destruction cascades’ via RAII, e.g. you quite often had big object trees made of thousands or tens of thousands of tiny objects, and destroying the top-level object can take significant time (e.g. seconds for uninitializing a level or on game shutdown - and this problem isn’t really uncommon, ever noticed how many Steam games take significant time to shutdown? That’s almost certainly hundreds of thousands of tiny C++ objects being destroyed via some RAII ‘destruction cascade’
  • hard to pinpoint in profilers, but such code is usually full of cache misses, since all those tiny C++ objects are located at more or less random memory locations

These problems can be solved by using arrays of objects (e.g. thousands of objects on the same type living in a single heap allocation, and ordered by “processing order” which helps to avoid cache misses) - the whole idea of ECS is built around this concept.

And the important takeaway is that trying to tackle those problems when your app has grown to a million lines of code and 100k users it’s to late, because you need to completely rearchitecture the entire codebase. Hotspot optimizations won’t help much.

The most typical situation I’ve seen for performance problems down the road is that systems were initially designed, implemented and tested to process a couple hundred of items, but then before you know it that number has grown to tens- or hundreds-of-thousands of items when the system is in real-world use for a while. Now suddenly big-O and memory latency starts to matter.

Often you’re lucky and the system never hits such a performance wall though.

And don’t get me started on TypeScript™…

TS is actually a decent language (for the problems it needs to solve). It could be a much better language without the requirement of adding type safety to large existing JS codebases. IMHO you can write shitty code in any language, even (or especially) in highly opinionated and restrictive languages like Rust.

Also if you restrict yourself to the ‘good parts’ of JS (e.g. via strict linting), it’s also not too bad. JS has come a long way since the late 1990s, and at least since V8 performance is also nothing to sneeze at.

Why? What you mean with restrictive languages? Ownership/Borrowing and bounded lifetimes kind of restriction? Or would you include zig under that umbrella?

(Safe) Rust puts a lot of restrictions on the code so that the compiler can create a proof that this code is correct (at least in terms of memory safety). Zig is somewhere between Rust and C on that dimension, but not close enough to Rust to be outeright annoying :wink: IMHO a too strongly-typed type system is just as bad as a too weakly-typed type system. My personal sweet spot would be somewhere between Zig and C.

But at this point, why not just choose the simplicity of C over any other alternative if it’s what in the long-term gives you the most reliability, performance and flexibility.

…fair point despite the obvious advantages of Zig over C. In some areas, C is strictly too sloppy and Zig’s increased strictness is a good thing (but also at the same time it can become annoying - it’s incredibly hard to find the right balance between enforced correctness and “joy-to-use” - unless you’re a masochist of course, but then you’d pick Rust over Zig anyway). OTH when writing cross-platform code that needs to access OS APIs (especially esoteric APIs like macOS ObjC frameworks), the C world still has advantages over Zig. E.g. while it’s possible to access macOS ObjC APIs from Zig by going through the ObjC runtime C API, this is definitely not a nice experience compared to just having a bunch of macOS-specific ObjC mixed into your C code. There’s a couple more little papercuts which still make me pick C over Zig for “bedrock libraries”.

…about the memory gap…

The factor 100 at the beginning of the 90s must have been for high-end workstations. Typical 8- and 16-bit home computers running at 1..8 MHz could access memory at roughly CPU clock-cycle speed.

4 Likes

they don’t go the next step of putting similarly lived things together in memory and free them together

Yes, exactly. When I first started using arenas I was looking for automatic cleanup after years of work with GC languages, but really it’s a collection of allocations that all are freed at the same moment. (The first rule of tautology club is … xkcd: Honor Societies )

The light bulb didn’t turn on all at once when I consider a Rust compiler performing a thousand lifetime analyses vs a Zig developer manually performing one lifetime analysis… well, I’m technically impressed by Rust but I don’t really want to actually use it.

The “5%” (from my prior post, reiterate made-up statistics) does represent a thornier problem but it seems like a GC or a system of Handles should be able to cover it quite well.

Rust lifetime analysis is done at the type system level, the number of objects is not relevant.

…it is very likely relevant for build times though :wink:

Also a lot of real-world Rust code I’ve seen uses hacks like RefCell<>, Rc<> or Arc<> as ‘borrow checker workaround’, and those all incur runtime overhead.

2 Likes

Nah, I was referring to the use of arenas in the post I replied to, I should have quoted a larger piece. When you have a collection of objects and you treat them all the same way, lifetime analysis is not impacted by the number of objects in the collection.

There is probably confusion, on my part and others, because different aspects of lifetimes and ownership are being talked about together.

3 Likes

True, although this is a separate issue because it isn’t always necessary and not always a significant cost. When I’m writing Rust I generally do find ways to avoid Rc/Arc, but I do have to bend over backwards sometimes to do it!

When sharing of objects is long term, in the sense that the ref count is only incremented/decremented during setup/shutdown for a long series of operations (perhaps for the entire program run), the cost of the ref counting is effectively zero. That’s the way I try to use ref counting, rather than for every object in a collection. For example, the ref count should be on the collection, not on its elements.

P.S. I should add that I did not always find ways to avoid Rc/Arc when I’d like to. In addition, there are certain data structures that cannot be represented in safe Rust, or at least not without undesirable runtime safety checks. The latter problem (for my Btree cache) is the reason I’m using Zig. Every tool has its trade-offs, and that’s where an engineering mindset (best tool for the job) can help to avoid philosophical rigidity.

1 Like

Garbage Collection is huge popular because it’s good for development speed

Garbage collection is a very good pattern if the development budget is the most important. It really speeds up the development that you don’t have to worry about lifetimes AT ALL! This is often the case in a project’s initial phase, in a corporate environment you have X time to deliver proof that this will work. In a startup you either need proof to get funding or you need to deliver to get sales, either way in many scenarios development speed is critical.

The additional runtime cost is rarely a factor, most of the time the software will be running on client hardware, it’s only when you’re hosting the software on servers the actual runtime cost ends up costing anything for the developers. And even in this case it often scales with activity which should equate to income, in which case it’s rarely a problem.

But Garbage Collection sucks for games and time critical servers

The exception to this is games and time critical servers where garbage collection can give annoying hiccups, sudden drops in framerate can really make an otherwise smooth game feel sluggish. Anyone who has tried optimizing performance in Unity will tell you that garbage collection terrible if performance spikes matters. The initial cost of development is dwarfed by the cost to optimize away the spikes caused by garbage collection. On top you have to invent anti-patterns to mitigate the problems.

Old school allocators (heap allocators)

It is important to remember that when garbage collection was invented the only real alternative was heap allocators. Heaps have their own performance problems and often making everything unnecessarily slow. In the old days memory leaks were a huge concern mostly due to poor tooling. I remember a time where most shipped applications actually had memory leaks, I’ve been in companies that said it was virtually impossible to remove all leaks and only significant leaks should be investigated! (This is a false claim, but with the tooling available it was sorta true)

Modern tooling like the debug allocators make finding memory leaks trivial, you get call stacks where the leaked memory was allocated so you know exactly what the purpose of the allocator was.

When I say heaps are slow, it should be taken with a grain of salt, unreal engine is probably running on a single heap allocator and you can make games with that which runs blazingly fast. What I mean is that if you use a single heap for everything and you mix temporary allocations with persistent allocations you tend to generate very fragmented memory. Fragmented memory have two problems, one is that cache misses are a constant reducing performance from what it could be. Careful alignment of critical data structures can really help mitigate this problem. Simply putting stuff in arrays rather than doing individual allocations is already a huge improvement. The second problem is that it requires a lot of book keeping for the heap to keep track of used and freed memory. Generally a heap can be optimized for performance or for reducing fragmentation, but it’s hard to optimize for everything at once.

Arena Allocators are awesome

Arena Allocators feels like an actual cheat code. As long as you know a maximum lifespan of your allocations you can set up an arena and allocate as freely as in a garbage collected language. Once you free (or reset) the arena everything in it is automatically freed, no need to worry. On top all your new allocations magically end up right where your cache is, in this way it resembles the stack. The only thing to keep in mind is that if for some reason you want your data to survive and become persistent you need to copy it into a pointer allocated with a another allocator… the cost of copying data is miniscule in the big scheme so unless the data is huge this should not be a concern.

Mix and match

The real major benefit of zig is that when it comes to memory is that for most scenarios you need two things, very likely at the same time. You need temporary memory that only matters inside the immediate action and then you have long time persistent allocations, which often but not always can be deleted on an individual level.

With languages like rust you try to plan out lifetimes and ownership, the problem is that for this to cover every imaginable scenario it becomes very terse and overly complicated.

Safe memory handling is easy

You just need to match each allocation with a free operation. You do this all the time, if you open a file you close it, if you make a start code bracket you make an end code bracket, same thing. The defer statement is excellent for this for short lived allocations, arenas are really good at handling this on an action level think mouse click, keyboard event, serving a http request, or anything that has a clear begin and end, because rather than worry about it you can just set up an arena and as long as you free (or reset) that you’re good (assuming your system is not significantly memory constrained).

The only remaining problem is persistent state, when you know you want memory to live longer than your current action, but you don’t know exactly how long. In those scenarios it’s easy to over complicate the problem and invent atrocities like Rust (yes, I said it, sorry not sorry).

It can be daunting to map out a guarantee that the memory get’s freed eventually. But it doesn’t have to be. Whenever you make a persistent allocation you pass that pointer off to somewhere, you add it to a list, a tree or some other type of state owner of some sort. As long as it remains there everything is good, and only once you (re-)move it from there you either have to free it or place it in another state owner. As long as you split it into a number of “state owners” and make sure each of them either frees or passes it on to another state owner. Do this correctly and you’ll never have leaks, do this incorrectly and have the debug allocator save you.

Heap and debug allocator

For the persistent memory that some applications absolutely need to have, you should just use a heap… all my criticism of the heap only applies if it’s misused… if you direct all temporary allocations to arenas and only use the heap for persistent data you will rarely hit the the corner cases that are truly horrible. The other concern is memory leaks, but thankfully zig has the debug allocator built in! The debug allocator was the wet dream of any developer in back in the 90’s, it keeps track of exactly what allocations where never freed, and exactly where they were allocated! Armed with this finding and removing memory leaks is trivial.

TLDR

Arenas are just as easy to use as garbage collection, but on top they come without performance spikes (or even significant cost), the only thing is you need good logical barriers for when the memory is no longer needed. By using Arenas to offload the Heap you ensure your heap will never become the problem and you will get significantly better performance than garbage collection can offer, and it only takes a small additional effort.

1 Like

Same can be said of HTML

I think that’s also why I wrote these posts, I need to understand if I am the crazy one to think this stuff

You’re not crazy for wanting to use programming languages that you like and work with simple, well designed software… but your post does make you sound crazy.

All jokes aside, you say you are a student; have you worked in the industry yet?

The reality is that a lot of companies are competing on quantity of features over quality of the product or the system. There is a lot of “good enough” software out there that gets the job done, there is tons of awful software that still provides enough value, and there are a lot of domains where there are no existing good options.

You might end up writing awful JS, but you also own the means of production and can start your own business on the back of the best software you can write.

1 Like

One other thing I will say is that the amount to learn and keep up with in this field is absolutely staggering and it’s impossible for people to be even be somewhat knowledgeable about most things.

Many people are just struggling to keep up and learn the things that they need to in order to be effective in the short term.

Sure, thanks that took the time to answer and didn’t just say “I won’t even bother” after seeing the thousand line of a post I wrote.

But one probably shouldn’t speculate on this, also given the above points you mentioned, especially

I mean JS is ok. I use JS, and exclusively for webc and DOM. Never had a problem when used in those context.

I have a problem in using JS for everything.

Proof of correctness? Wouldn’t this involve some sort of validation (which is an undecidable problem to my knowledge)?

I guess? Not sure about it though. I find HTML quite good. Never given many thoughts about it to be honest.

Anyway, at least we didn’t try to make it the frontend, backend and database language.

just to chime in from the mathematician corner: just because a problem is undecidable in full generality does not mean one can’t make meaningful headway in significant special cases. for example, Zig’s comptime invites The Halting Problem over for dinner, the most famous undecidable problem. Zig does not even attempt a solution; instead it says “comptime evaluation will halt because I’ll halt it”.

one of the reasons Safe Rust is aimed at producing a subset of correct programs is (perhaps) because those are the programs for which the problem set for the borrow checker are decidable.

7 Likes

Thanks.

I reread the post today and I have to admit that I got a bit carried away by my… let’s call it passion. I guess I had a lot of these thoughts and feeling bottled up for a while without being able to express them and I guess on that post they just came out.

Also, I’m probably not all good in my head anyway :joy:

Yes and no. I did some freelancing, I did a couple of internship at small (20k-50k users) company but the technical level was pretty low overall (mine and theirs). If you meant big companies with skilled devs, then no, I didn’t.

I believe comptime execution even belongs to the complexity class of primitive recursive functions since @setEvalBranchQuota sets an upper bound for all backward branches, so the halting problem is actually mathematically decidable for comptime!

They are rare, but you know it when you see it and normally happens because trying to hammer your problem into a set of tables becomes really awkward.

1 Like