Sze
June 3, 2025, 7:20pm
10
There also was this old (clever) implementation that was used within the WASM namespace for a while (that still could be used for other things):
pub const StringTable = struct {
/// Table that maps string offsets, which is used to de-duplicate strings.
/// Rather than having the offset map to the data, the `StringContext` holds all bytes of the string.
/// The strings are stored as a contigious array where each string is zero-terminated.
string_table: std.HashMapUnmanaged(
u32,
void,
std.hash_map.StringIndexContext,
std.hash_map.default_max_load_percentage,
) = .{},
/// Holds the actual data of the string table.
string_data: std.ArrayListUnmanaged(u8) = .{},
/// Accepts a string and searches for a corresponding string.
/// When found, de-duplicates the string and returns the existing offset instead.
/// When the string is not found in the `string_table`, a new entry will be inserted
/// and the new offset to its data will be returned.
pub fn put(table: *StringTable, allocator: Allocator, string: []const u8) !u32 {
const gop = try table.string_table.getOrPutContextAdapted(
allocator,
This file has been truncated. show original
The benefit of that variant is that it doesn’t need to use/store any slices internally, it just stores the start position of specific null terminated keys.
Here is a related zig news post
Ahh, now it is still used here:
buffer: std.ArrayListUnmanaged(u8) = .empty,
table: std.HashMapUnmanaged(u32, void, StringIndexContext, std.hash_map.default_max_load_percentage) = .empty,
pub fn deinit(self: *Self, gpa: Allocator) void {
self.buffer.deinit(gpa);
self.table.deinit(gpa);
}
pub fn insert(self: *Self, gpa: Allocator, string: []const u8) !u32 {
const gop = try self.table.getOrPutContextAdapted(gpa, @as([]const u8, string), StringIndexAdapter{
.bytes = &self.buffer,
}, StringIndexContext{
.bytes = &self.buffer,
});