Building a Redis Clone in Zig Newsletter

2 Likes

Good job, some critiques:

If you don’t need functionality specific to the implementation File.Writer then you should be storing the interface pointer *Io.Writer.
The solution to the dangling pointer from creating the writer locally, is to not create it locally, take it as a parameter to init.

You already have a custom writer (RDB writer), even if it doesn’t implement the Io.Writer interface, you shouldn’t make another one (especially one that implements the interface) just for a running CRC. Fewer interface layers means less virtual calls, means less call overhead and less fragmented code.
You can just have a helper function to wrap the call to the writer, this would also remove all those interface accesses you don’t like, though my previous suggestion also does that.

Regarding the interface buffers, only the end of a chain should have a buffer (generally speaking). Ofc benchmarks outweigh that, I do see merit to batching CRC updates.

You also break the contract of drain by passing the buffer as an argument

    /// `buffer[0..end]` is consumed first, followed by each slice of `data` in
    /// order. Elements of `data` may alias each other 
    /// **but may not alias `buffer`**.

It is expected to deal with the buffer itself, and the data is explicitly not allowed to alias the buffer.
Unless you are manually checking before every write if it will cause a drain and calling flush first instead, if you’re not doing that then you are not writing the buffered data as write just calls drain without passing the buffer as data since it is assumed drain will handle the buffer.
It’s also possible you simply call flush before you ever get to that point, which is probably how you missed that bug in testing (always test limits).

I read the previous parts and saw some oversights and incorrect statements/assumptions, I’ll put those in a separate comment.

2 Likes

Thanks for the feedback! I updated the code accordingly. That’s good to know!

1 Like

Forgot to do that other comment about your previous parts :sweat_smile:

Part 1

Hash Tables

Zig’s standard library provides a StringHashMap , which is a good starting point. However, we need to ensure that our keys and values are properly managed in terms of memory ownership. For that, we can use StringHashMapUnmanaged , which allows us to define custom allocators and manage memory more explicitly.

There is no difference between them in terms of ownership, or ability to use custom allocators.
The difference is unmanaged containers don’t store their allocator, instead requiring you to pass it to functions that might allocate, this makes it easier to see where you allocate and encourages you to allocate upfront if possible.

On that note, the previous section ‘Memory Management: The Hard Part’:

const Store = struct {
   map: std.StringHashMap([]const u8),
   allocator: std.mem.Allocator,

Stores a managed StringHashMap and an Allocator, which effectively stores the allocator twice.

Also, your OptmisedHashMap is just a StringArrayHashMap, until you experiment with different hash algorithms in part 2, then it’s different.

Part 2

You can see the benchmark code and results in this GitHub Gist. Long story short, for strings shorter than 45 bytes, std.mem.eql(u8) is the fastest option. For strings between 45 and 128 bytes, a 16-byte SIMD using XOR is the fastest; it reaches an average of 850 million operations per second on my M4 MacBook Pro.

std.mem.eql does already do SIMD, but since you made a faster implementation you should make a PR, or an issue.

Intern Strings

// Map of interned strings
   interned_strings: std.StringHashMapUnmanaged([]const u8),

   fn internString(self: *Store, str: []const u8) ![]const u8 {
       assert(str.len > 0);
       const gop = try self.interned_strings.getOrPut(self.base_allocator, str);
       if (!gop.found_existing) {
           const owned_str = try self.dupeString(str);
           gop.key_ptr.* = owned_str;
           gop.value_ptr.* = owned_str;
       }
       return gop.value_ptr.*;
   }

Since the key and value are the same, use StringHashMapUnmanaged(void), so you only have to store one of them. This will reduce the allocated memory.

2 Likes