Thinking about if it's a good idea to add a function to observe ``std.fifo.LinearFifos`` memory

markus · August 2, 2024, 3:01pm

Motivation

I am currently playing around with caching of recently used files and thought a RingBuffer / FIFO may be a good idea for that.
I thought I would simply put freshly opened or used files at the top of the buffer and when opening a new one I would first check if the buffer already contained it and if it did just return that one.

The problem

I first looked at std.RingBuffer which does have a function that (almost) does what I want it to (give me two slices that I can use to observe it’s contents), but sadly the RingBuffer could only store u8s.

Then this issue lead me to std.fifo.LinearFifo. It does everything I expect from a RingBuffer / FIFO, except provide me an equivalent to std.RingBuffer.sliceLast

The idea

Add a function like

pub fn slice(self: Self) struct {
    first: []const T,
    second: []const T,
}

similar to std.RingBuffer.sliceLast

As far as I can tell, there also aren’t any real footguns associated with this. The only thing I can think of is the memory getting overwritten when using the slices from an older state, but that exists in std.ArrayList etc. too.

Let me know you thoughts!

LucasSantos91 · August 2, 2024, 3:54pm

What are you trying to cache, if not bytes?

Sze · August 2, 2024, 4:50pm

For something like an LRU cache, wouldn’t you want something more like a priority queue, where the older the element is it would get a higher priority (e.g. lower timestamp higher priority), that way you could use the elements from the queue es elements to be evicted if the cache is full.

If the cache is big enough you also would want to have a hash Index for the elements so that you have O(1) lookups instead of O(n).

Then when accessing an existing element you can also update it’s timestamp causing it to get a new higher priority, keeping it in the cache. This means cache items can remain in the cache for a long time if they are popular enough, but also means that your cache items need to be idempotent/remain-valid, because they won’t be regenerated until they were evicted, by more popular entries.

Of course you also could add a TTL (time to live) timestamp after which the entry always gets evicted and refetched.

What you described sounds more like a debounce mechanism to me, because it seemed like it would evict entries round-robin style, but maybe I am misunderstanding you.

Even if that is what you are after, it might benefit from a (hash) index.

I think it would be good if you described your goal and constraints a bit more, so that people can make more helpful suggestions. If you only have 30 entries max you don’t need an index, if it’s 100000 you definitely should have one. Are the cache elements unchanging, or do they go stale?

markus · August 2, 2024, 4:57pm

Structs that have been parsed from a file

markus · August 2, 2024, 5:29pm

Okay here comes more context:
I watched some videos about how databases work and really just wanna play around with it a bit.
Data is stored in files which are loaded into a struct called page, and managed by a page manager which caches them and provides access.

I am aware that a ringbuffer is suboptimal here when it comes to larger sets, but the sets remain very small and I think that (i know measuring is the only truly valid argument here, “think” is never a strong point) that with that size the performance may be ruffly equivalent or even better.

I originally chose a fifo because I felt its property to avoid having to memcpy everything by 1 index every time something is popped may be more performant, which as far as I can tell, the priorityqueue does not have (i may be wrong, i just briefly looked at the implementation)

Regardless, I think that the function I proposed may be useful to some people using the fifo

markus · August 2, 2024, 5:36pm

Thank for telling me about priorityqueue though, I never knew it existed in the zig std

Am I correct in my understanding that its O(n) though?

Sze · August 2, 2024, 6:02pm

I haven’t looked at the std.PriorityQueue implementation in detail (just now had a short look at it) and was talking more in general about priority queues, but I think it uses and maintains a Heap so it is likely that insert is O(log n) and accessing the priority element is O(1), but I am currently too lazy to go through the code to check.

I guess I would be more likely to run a few benchmarks with workloads I am interested in using and maybe comparing that to using something else.

There is also the Treap which also could be useful, depending on what you want to create exactly.

I think it depends on what kinds of operations are most used by your code and also depends on the ratio between reading and writing, cache hit rate and stuff like that. Basically it is difficult to make general remarks, without actually measuring stuff.

markus · August 2, 2024, 6:54pm

Oh yea you’re correct. In priority queue I just saw a slice and a field called cap and thought “That must be an O(n) thing”, but I think its a heap too.

I dont know the exact operations I will be performing myself yet, at the moment I am mostly shoving text around and making sure it compiles.

I will switch to priorityqueue for now though, as it does seem to be more convenient.

I appreciate the help!

mnemnion · August 2, 2024, 7:19pm

I ran across a paper recently on a cache maintenance strategy which looked like would be a good fit for Zig (not the standard library per se, just Zig programming).

The work is directly applicable to the problem space you’re exploring, so I thought I’d share it. The PriorityQueue is a good candidate for implementing the reinsertion strategy sketched out for the main cache queue.

Sze · August 2, 2024, 9:00pm

Looks like an interesting read, will have to take a closer look later.

markus · August 3, 2024, 7:47am

Lol I should add that this was my very first attempt at doing something with caching, I did no research whatsoever and had no idea what I am doing.

Im surprised I got so close to something people actually do in the first place.