API Design

nyc · May 26, 2024, 3:34pm

I write this kind of code daily - I do trading systems for prop firms, and the properly predicted branch will get pipelned away in the rest of the rest of the code. That predicted branch won’t even cost an instruction slot. You are going to be limited by running out of the uop cache if the loop is that tight.

And in code like this you are trying to avoid hash lookups are much as possible and instead tend to do a lot of bulk operations (eg take everything from last recv, put keys in 32 entry array, create a u32 where each bit is an add or del flag, make once call into the ht to do all of that and get a 32 entry result vector back – these are the things that can make code very fast.

The assume capacity calls are pipelined away and the OOO execution makes them irrelevant. I can’t stand how bloated they make an API and how they detract from true high performance code - it gives people a good feeling that they are writing high performance code when really it isn’t making any difference and maybe if they knew that they would actually learn to write faster code.

But I also work on very high end Intel machines, and don’t have to deal with low power chips and in order execution (old Intel Atom), so my view of performance can be overly narrowed by my target platforms. (This is one though where it really doesn’t matter).

There is also ways to take out the assume capacity calls incorrectly. Eg. This is very wrong. The work function is always calling the checkCap call unless it inlines it, then work might be inlined and you’ve polluted the hotpath with a bunch of cold cold.

pub fn work {
   checkCap();
   // ... do my work
}

fn checkCap {
   if(new_size <= capacity) return;
   // ... realloc and grow  
}

Instead you want to do the capcity check either by hand of a tiny function I force inline and keep all the growing code in another function (often marked noinline to make sure the compiler doesn’t get any funny ideas):

pub fn work {
  checkCap();
   // ... do my work
}

inline fn checkCap {
   if(new_size <= capacity)
      growCap();
}

noinline fn growCap {
   // ... realloc and grow  
}

(usually check will just be done at top of the call by hand instead of having a function unless it is an unintuitive check that is easy to screw up.

That’s how you are supposed to do it, and in that code path work is as straight as possible and you dont want any complex methods getting inlined *you only get around 30 uops to fit in the loop stream buffer which is the optimal case. (In highly highly optimized code it might matter to elide that check since you would be using a uops (the test and branch get fused and I think only count as 1 uop but not sure) of the 32 you get, but would be a very special case and everything else would need to be optimzied perfectly for that to even matter

I just not a fan of how bloated the zig apis are. There are a lot of false optimization and pessimizations in the code too that I have a constant battle over. but I do understand that is the way the zig community does things.

nyc · May 26, 2024, 3:45pm

I totally should have looked at your code first. If just did to see if I could help in any of the hotpaths. I didnt realize you realize you were wrapping HashMap. In that it makes complete sense to just copy its API regardless.

Sorry, carry on.

mnemnion · May 26, 2024, 6:01pm

I agree that the AssumeCapacity functions are useless on superscalar chipsets with good branch prediction.

But as you allude to here:

Not every CPU performs that way. In fact if we were counting CPUs, most do not.

Zig is intended to be useful for low-end microcontrollers, including AVRs with kilobits of addressable RAM. Those tend to just do what you ask. If the community using Zig in embedded were to say that AssumeCapacity functions don’t make a difference for their code, then I’d support removing them. But this is unlikely to be the case.

The standard library deserves to be much better documented than it is, and that should include some use notes, definitely including something about how AssumeCapacity functions are intended for very low-end CPUs and won’t help you at all in a hosted context. I suspect many of the people drawn to Zig have a gottagofast mentality, but one which isn’t necessarily backed up by a rich understanding of how CPU execution works on their target systems. Assuming the risk of memory corruption for an ‘optimization’ which is irrelevant to your target architecture is the sort of thing which should be warned against.

nyc · May 26, 2024, 6:22pm

id be totally curious if it helps on any architecture.

the state of the zig benchmark or profiling landscape is fairly poor still so might be a while before that is figured out.

I am well aware my architectures slant too. embedded is about the only place where you have to be limited and fast at times. Eg if you are running on an Atom processor, that branch isn’t going to mean shit on your cpu because everything is slow…

a lot of people don’t get performance optimization. it is a rare skill, earned with a lot of experience.

mnemnion · May 26, 2024, 6:57pm

AVRs don’t have branch prediction, period. If there’s a branch in your program, the chip will take it.

Then again, they have at most a two-stage pipeline, and no caching, so the effect of a branch is less extreme.

That leaves code bloat, which is a pressing concern in embedded. Although at that point I wonder how many std data structures are even feasible to use.

I haven’t programmed a micro in about ten years, and never professionally. Hence I would want to defer to the Zig embedded group on whether the AssumeCapacity collection of functions is genuinely useful to anyone.

AndrewCodeDev · May 26, 2024, 9:48pm

@mnemnion, @nyc… There’s another difference in the API’s though that I’d like to point out that I do not see being addressed. The assumed capacity methods do not return an error. Now, we may agree that there is still some deeper misgiving with the API design, but that’s one other substantive difference here worthy of conversation.

mnemnion · May 27, 2024, 12:32am

It is a good point to bring up. So that’s two cold branches and an extra register. I suspect @nyc is right that this basically adds up to nothing on modern hosted systems, but for constrained platforms it could be enough to tilt the balance.

nyc · May 27, 2024, 4:39am

@mnemnion there not cost on the caller, If you know you have enough capacity just discard the error with catch unreachable. That’s basically what it is there for. And if inlined it goes away for the callee too.

I know there is a small cost to error producing functions in debug because it records error trace information and undoes it on return. I don’t know if that same penalty exists in release small (which is essentially the only time I think mnemnion think it ever matters. If that hidden cost still does exist:

if the extra if going to get you, all those error return start records are really going to kill you so you shound’t be using most of std at that point, but would the assume calls could definitely make it palpable enough to bit your tongue and use std, but I don’t know if that penalty still exists in release modes. (that was kind of confusiing: tldr the assume calls might be even better for release small users since they dont have the extra check and they dont have to pay the error stack costs all the time.
but this only exists if things aren’t getting inlined. That’'s the hope for all this. The extra branch really only matters if you are getting inlined, because the non-inlined code is already paying a branch cost on the function call and these devices if they don’t have branch prediction also aren’t going to have branch target prediction and are generally already paying a fat ccost on the function call in a hot loop.

So it is kind of an odd situation: if they don’t get inlined, assume variants might make it more palpable if only you were having a branch in a hot loop already.

if you do get inlined, the error return doesn’t matter because it’s gone now and catch unreachable is going to solve any other problem.

If you are worried about someone misusing catch unreachable, I think people are just as likely to misuse that as they are the assume variants. I do wonder how many times people have given themselves 8 hours debugging session thinking they already reserved enough and then used the assume variants. lololol.

If there was no cost for having these calls, I wouldnt really have a care about it, but there is a cognitive load to them, and makes it harder to go through the code and docs to find what you want. It makes some APIs almost doulble in size. My general cost-benefit calculus falls on not having them just because of the cognitive load. And even if it helped a very small minority (very very small considering the conditions that need to occur for it matter at any call site), I would stil say remove them because it harms everybody else a little too.

zig, and all things, makes decisions like that all the time. Zig retries or disallows handling of certain error from from some os calls because it would help some people write code better even though it definitely hurts small minority. I hit that just yesterday - I have to find a way around an munmap / page allocator issue where the github issue doesn’t seem like it is considered a bug, because most people won’t hit it.

This discussion has definitely made the think more in depth about the issue, and it hasn’t changed my mind, but I have a deeper understanding of the issue that I did going in. that’s some good shit.

nyc · June 3, 2024, 7:36pm

I just did something a little non-orthodox (read: stupid) and I’m going to see how it plays:

I have the alloc argument as an anytype. I made a null allocator that sends to unreachable, and it has no error return. If the anytype has an error union for a return type so does the function.

So now I don’t need an assume variant. the function is very few lines and will almost certainly be inlined, any branch that goes to an allocation will be DCE’s away so penalty on that.

The only problem is that it is an anytype, and those kind of suck. But besides that, it might work. We’ll play it out.

I was throwing dumb ideas off @mnemnion and one was to have an environment context to calls (this was orignally for operator overloading so you could define operators in a struct/namespace then then $mystruct( a + * c ) and it would use the operators in mystruct if they existed first. You could do this for matrix overload operators or change all the operators to the saturating kind and not have to have individual saturation ops. $saturating_namespace(a + b + c).

But I was wondering why not put an allocator in there too, and change the parens to braces and evaluate an expression list with a particular allocator supplied by the environment. You could se the allocator tothe null one and all functions automatically become the assume variant just by the nature of the allocator.

totally crazy idea I know, but seems interesting.

mnemnion · June 4, 2024, 1:14am

What a coincidence, I was just doing some background reading and came across a comment on #871 which suggests something almost exactly like you’re describing.

It is, naturally, in a closed issue about operator overloading, of which there are several. I suspect it would be quite difficult to come up with an operator overloading proposal which is different enough from those to justify reopening the question.

I’m afraid we’ve meandered off topic though.

LucasSantos91 · June 4, 2024, 1:50pm

And if it doesn’t get inlined, it will have a cost. You already took the mental effort to decide what to do with the error. After this, calling the assumeCapacity version is free from a mental cost and, in some cases, in can give better performance, so why not use it?
If the assumeCapacity version had duplicated code from the non-assumeCapacity, I agree we could be losing performance due to cache effects, but in the std library, the non-assumeCapacity functions just internally call the assumeCapacity functions, so there is no binary bloat. Essentially, they are just the assumeCapacity functions with extra stuff around it. If you already decided you don’t need the extra stuff, and you’ll have to do that anyways because you need to think what to do with an error, then calling the assumeCapacity function is just obvious.
Even if the branch gets predicted, the prediction machinery can get saturated. Some CPUs don’t have prediction. The compiler might be able to do more optimizations if there is no branch. All of these things you can get for free, because calling the assumeCapacity function is almost always obvious, and you already have to think about the possibility of an error anyways.

nyc · June 4, 2024, 4:48pm

I don’t think zig’s std understand the important of cognitive load and keeping small APIs. The question isn’t is there some use case, but it is a comparative is the use case large enough to warrant its inclusion. I don’t think the assume calls generally pass that test, but there could be a few. Going to the data structures in zig and seeing 30 calls where 10 of them are assume variants isn’t a good development experience (this also applies to classes and other trivial function differences).

I don’t think you are getting the performance differences you think you are out of them, and the non-assume calls can be written better to fast path the non-resizing calls. That’s single branch will get buried in the pipeline.

I write code where literally nanoseconds matter, and I can count on one hand the number of times a call like this has affected anything that couldn’t be fixed in other ways. Std is trying to optimize for the wrong thing here I think. The trade off it makes for polluting the API and making it much more bug prone is a very questionable performance gain of potentially a single cycle that will only matter in the most rare circumstances.

I could totally be wrong, but I have yet to see anything to show me that.

This is the only value I see in it, but if you are that worried about performance on a processor that has that tight of constraints, you probably aren’t using std bc it just isn’t built well for that. There’s too many other issues to worry about, and this is probably way down the priority list.

A better API for this stuff tends to be taking in bulk. Instead of am assume variabnt, take in a block of values. It gets all the assume benefits plus more.

Sze · June 4, 2024, 8:20pm

I personally don’t buy the “to many names in the api” argument, there aren’t that many names, if you look at the code the implementation is very small, all the functions have a logical and consistent way of being named, that explains what the functions do exactly and they build upon another in a logical way. It is just a matter of getting used to it.

I also find it way more satisfying to be able to express my intent precisely by picking the right function, instead of having to pick one overly generalized function and having to hope that the inaccuracies it introduced, then get optimized out again.

Additionally I also think it is a loss in terms of being able to read and understand the code and being able to understand what the original exact intention was.

If the code uses functions that generalize things, then when refactoring that code to do something different, that forces me to rediscover what the original intent was, this can be extremely difficult and time consuming, when the code was written by somebody else, because people think differently.

With functions that do exact things, I don’t have this problem of having to re-re-re-interpret what the intent of the program might have been, to be able to re-write it to use some other data structure. It makes it easier to reason about code locally and change things around, without having to get into philosophical arguments.
It also reduces the amount of code that needs to be considered, when changing stuff, because I don’t have to mentally elide the irrelevant code, that was included, because of the imprecision introduced by generalized functions.

If this was some macro transforming language like racket, I would totally buy that argument for something like a match macro that should just give you a nice interface on the outside and then do all kinds of transformations and optimizations on the inside to generate efficient code for that.

I also would accept it for something high level like APL or BQN, but with Zig I don’t want more layers of “hopefully/probably it will do the right thing” I want less of that and that is what Zig gives you.

Is it more verbose, does it require you to choose more precisely? Yes.
But at least I don’t have to deal with these things anymore, that do a whole bunch of unnecessary stuff, that have nothing to do with what I wanted to do.

Code feels more like legos again, instead of building mud castles.
I find it helps me with getting things done.

And yes there are ways to add items in bulk with the addMany... functions.

I find it tiring, when somebody complains all the time, insisting that everything is badly done and unnecessary and should be done in other very particular ways, it comes off in a way that drags down the mood, feels unproductive and doesn’t really add something.

I think if you really want to convince somebody that your APIs are that much better, then prove it by implementing them in one or a few data structures and creating a package for them, so that others can use those and decide whether they like those better. Otherwise this just seems like you are acting like a guru, telling everyone else how it should be done, without doing any of the work.
Nobody wants to be pushed around by being critiqued.

Critiquing things is ok, but there should be a balance to it, if somebody just keeps critiquing things into the ground, without lifting a finger to do anything about it, I lose interest in what that person is critiquing. Have a growth mindset and actually get going towards creating solutions, don’t get stuck in a loop pointing out problems. If you are that annoyed by it, than do something to change it, or create an alternative.

AndrewCodeDev · June 4, 2024, 9:30pm

This is really the hardest part of the whole issue.

When I was working on fluent with @pierrelgol, I decided to go with enums that switch out argument types at comptime. Why? Just because. I haven’t done it that way before and it turns out that… yeah… you can carry a library to completion that way.

Would I do it again? Maybe in small pieces - it works for what we made and I can make a few arguments in favor of it, but it’s primarily aesthetic at the end of the day. I think a mixed approach is better but it essentially boils down to the same code either way. It’s a bit different but, eh, I kinda like it and may change some things about it in the future - don’t know.

I feel like I’ve read the “for and against” arguments here and frankly it’s very inconclusive. Do it if you want, or if you don’t, but it sounds like what Freud called “the narcissism of small differences”.

nyc · June 4, 2024, 9:42pm

I find it tiring and irksome when people cheerlead constantly. Nothing is really learned and ideas aren’t pushed forward. Im not big on me-too-ism and posting the 5th comment in a row saying the same thing (for or against). And people usually beat me to saying why things are good. you will see them from me, but it is going to by definition be on less commented threads – and the ideas have to be more than better than average. I also don’t comment on things I find true inconsequential like style arguments.

we all draw our lines differently and bring different roles, abilities, and experience to the table. it would be a terribly boring and useless place if everybody was the same.

I don’t relish the role – having every hate you gets tiring – but the iconoclast is a needed element in any group. Two somewhat odd qualitites of a dysfunctional group are: easy decisions making and lack of disagreement. Just as constantly struggle in those areas isn’t healthy, if every decision is unanimous with little pushback, that isn’t healthy either. I just have a lot of experience, and I have a deeper bag of ideas that I see work and fail in than others (esp when performance, networking, etc… are concerned).

if you ever want to swtich roles for the week, Id make a deal – i’ll make only positive comments and you make only constructively critical comments (I almost always give alterntives when I don’t like something).

AndrewCodeDev · June 4, 2024, 10:04pm

Both extremes aren’t helpful, sure - taking the principled middle ground is the hardest place to stand because we have to actually weigh pros-and-cons.

That said, can we get back on topic?

@nyc, your position so far is that what little is gained by separating out the two function types (assumed/unassumed) gives very little benefit in terms of performance gained and you’ve named a couple reasons.

Now, your argument against it really emphasizes the implementation burden and cognitive overhead involved with segmenting out the functions for what you see as little-to-no benefit.

I believe I have accurately restated your position at a high-level. Tell me if I am wrong.

I do not disagree with your principle, but we disagree here in scale for this instance. Yes, functions can be overly atomized and can become sprawling API’s that jump all over the place where no function actually seems to get much done.

Agreed - that’s a problem.

I would argue that we’re not approaching that. The design of ArrayList and the assumption-split does not incur the kind of burden that would worry me.

Specifically, I would argue that what you are talking about is actually an empirical issue fundamentally. How many people are actually confused by this distinction? I can make arguments that people will be confused but that is very different than what actually confuses people.

@Sze and myself actually have a lot of empirical evidence for seeing what people actually struggle with in Zig. We’ve helped a lot of newcomers understand the language and I have never actually seen this as a problem that people face. I just don’t think the empirical evidence is there to support your cognitive burden claim.

Again, we agree that this can become a problem - I don’t think this is an example of it.

deckarep · June 5, 2024, 1:55am

Since I’m getting pinged on this don’t thread I’ll add some of my own views on what I think are good API design considerations.

In most languages, the standard lib code is designed for the common cases. Yes it needs to be very performant but it also needs to serve the needs of many different use cases.

If some people are working in situations where “nanoseconds matter” usually a standard library collection is no longer going to cut. At this point you are writing highly customized code to exploit the exact hardware and environment you run on along with fitting your application use case.

Additionally, when you are so concerned with optimization at this level you really need to bring data into the picture. The performance metrics are what will steer your api, code and design choices …not wild claims.

Can Zig’s standard library get better? Of course but it’s quite good already considering the community is starting to build robust things with it. So I don’t think it’s incredibly healthy to claim everything sucks when people and teams are already building real world software with what we have now.

Zig is young and things are improving and will only get better over time. Also I’m not sure I want a data structure that’s designed for nano-second latency because it likely won’t be a good general purpose experience for my needs. I’ve used heavily optimized libraries before and sometimes they aren’t easy to use period.

No matter what, trade-offs will be made. Zig and the standard library will improve and blanket statements don’t really help. Code matters. Ship a solution that actually shows some performance gains and a better API otherwise it’s just talk.

Don’t forget Zig is open source. If you propose a better API with performance metrics to back it up I’m quite sure it will happily get merged.

The power of open source!

nyc · June 5, 2024, 4:51am

is this directed at me, because i think you have the sides reversed?

I’m the one saying that the extra assume variant calls don’t need to be there because they are for too specific of use cases (as in I write code where nanoseconds matter and we would never use this anyways so optimizing for that case is pointless).

I’m the one saying the extra branch is unimportant and there are more important things to optimize the general case better

The other side - they are saying that the extra branch matters because that performance might be needed.

You clearly agree with my take on that, you just don’t seem to realize it.

So I don’t think it’s incredibly healthy to claim everything sucks when people and teams are already building real world software with what we have now.

Tired of the strawmen. nobody is saying “everything sucks”. I said the APIs are too large because the overly specialzed assume calls make them 1/3td bigger and literally save a single cycle in rare cases - they don’t pull their weight (ie, exactly what you said about they are too concerned with rare use cases and ignore the general usability – we are completely on the same page on that – everybody else isn’t though).

mnemnion · June 5, 2024, 5:28pm

In my opinion, the stronger argument for the AssumeCapacity function variants isn’t shaving a cycle or two. It doesn’t take deep compiler magic to see a catch unreachable and use that to erase all the code which could throw the associated error. This is actually a bedrock optimization, and I would be upset if I found out the compiler can’t perform it. I specifically told the compiler that something can’t happen, and it still checks in release mode??

Rather, the strong argument is that AssumeCapacity is explicit, it tells anyone reading the code what’s going on. Zig standard evidently favors more functions, and is willing to use a bunch of full words (and long ones!) to provide that explicit control.

Why appendUnalignedSliceAssumeCapacity and not appendSlice(..., .unaligned) catch unreachable?

This is a deliberate design choice, whether or not it’s the correct one.

A strong argument against this approach needs to demonstrate that it understands that the choice is deliberate, that it’s based on perceived benefits, which put that API on the correct side of the tradeoff matrix.

Quite aside from the question of explicit verbosity in the API, it also needs to be addressed that the methods which don’t AssumeCapacity will allocate when full. They don’t just throw an error if capacity isn’t present: they either use their allocation pointer, or you must provide one.

So the semantics are meaningfully different, and this practically demands two functions. I don’t care much if the name used for the one which doesn’t allocate is perfect, personally, but I haven’t seen a proposal to actually shrink the number of functions involved, and I don’t think a reasonable one can be made.

So you could have append, with a capacity error suppressed with catch unreachable, and appendWithAllocation, which behaves like append does now. But I don’t think that changes much.