Unfortunately no. Although the actual use case is not like the country code, but just like the country code there are numbers that don’t map to anything.
Plus the numbers, that will serve as keys could be very large.
So I think with these two constraint, an array won’t suffice.
Another solution is to create lookup tables with comptime. If the keys are unique, you can sort them to use binary search at runtime. This is the most memory compact way I think this could be solved, but it only support key->value lookups, and the keys have to be a of a known fixed size. If the values are also a known fixed size, you can skip the second table entirely.
const std = @import("std");
const assert = std.debug.assert;
const eql = std.ascii.endsWithIgnoreCase;
const data = [_]struct{u8, []const u8}{
.{2, "hello"},
.{3, "world"},
.{9, "greece"},
.{12, "usa"},
.{13, "antartica"},
};
const NumType = u8;
const Key = packed struct {
val: NumType = 0,
off: usize = 0,
len: usize = 0,
};
const ALL_STR_SIZE = get_str_size(data);
const both = build_lookups(data);
const keys = both[0];
const strs = both[1];
fn get_str_size(comptime a: [data.len]struct{NumType, []const u8}) usize {
comptime var c: usize = 0;
inline for (a) |x| c += x[1].len;
return c;
}
fn build_lookups(comptime ar: [data.len]struct{NumType, []const u8}) struct{[data.len]Key, [ALL_STR_SIZE]u8} {
comptime var k: usize = 0;
comptime var z: usize = 0;
var a = [_]Key{.{}} ** ar.len;
var b = [_]u8{0} ** ALL_STR_SIZE;
inline for (ar, 0..) |x, i| {
k = x[1].len;
inline for (0..k) |j| b[z+j] = x[1][j];
a[i] = .{.val=x[0], .off=z, .len=k};
z += k;
}
return .{a, b};
}
const LNS_JUNC = 64;
fn lookup(val: NumType) ?[]const u8 {
var bot: usize = 0;
var top: usize = keys.len - 1;
var mid: usize = top >> 1;
while (true) {
const k = keys[mid];
if (val == k.val) return strs[k.off..k.off+k.len];
if (bot >= top) return null;
if (top - bot <= LNS_JUNC) {
return for (keys[bot..top+1]) |i| {
if (i.val == val) break strs[i.off..i.off+i.len];
} else null;
}
if (val > k.val) {
bot = mid + 1;
} else {
top = mid - 1;
}
mid = ((top - bot) >> 1) + bot;
}
}
test "lookups" {
assert(eql(lookup(2).?, "hello"));
assert(eql(lookup(3).?, "world"));
assert(eql(lookup(9).?, "greece"));
assert(lookup(4) == null);
assert(lookup(5) == null);
assert(lookup(6) == null);
assert(eql(lookup(12).?, "usa"));
assert(eql(lookup(13).?, "antartica"));
}
Why do you have to find a way to do this? Is it not a valid option to just write whatever code you want?
Use a switch statement. Use a perfect hash function. Some people enjoy writing trie implementations, that’s an option too. You’re a programmer, aren’t you? You can solve this in 100 different ways.
A catch-all ComptimeHashMap that can be used for anything is convenient, but bespoke solutions tailored to your specific use-case are likely to be better. This is even true for StaticStringMap (previously named ComptimeStringMap), since it only actually has a benefit when the data has keys of varying string lengths (the optimization is just to sort the keys by length and then only compare equal-length strings in get).
As an example, here’s some ISO3166 ‘alpha2 → country’ lookup code I wrote but never ended up using (e.g. "be" → .belgium). It uses an array of type [26 * 26]Country and then @as(u16, a) * 26 + b to calculate the lookup index for an alpha2 code (where a and b are lowercase_char - 'a'). If I use the same alpha2_codes data with a StaticStringMap like so:
it works fine, but performs much worse, since the StaticStringMap is just doing a linear lookup because all the keys are of length 2:
Benchmark 1: array.exe
Time (mean ± σ): 29.6 ms ± 0.6 ms [User: 30.2 ms, System: 4.8 ms]
Range (min … max): 28.4 ms … 32.2 ms 61 runs
Benchmark 2: map.exe
Time (mean ± σ): 180.0 ms ± 1.1 ms [User: 175.8 ms, System: 2.6 ms]
Range (min … max): 178.9 ms … 183.5 ms 15 runs
Summary
'array.exe' ran
6.09 ± 0.13 times faster than 'map.exe'
Benchmark code
const std = @import("std");
const iso3166 = @import("iso3166.zig");
pub fn main() !void {
const iterations = 1_000_000;
var rand = std.Random.DefaultPrng.init(0);
const r = rand.random();
const max_len = 256;
var buf: [max_len]u8 = undefined;
var num_invalid: u64 = 0;
var i: u64 = 0;
while (i < iterations) : (i += 1) {
const len = if (r.float(f32) >= 0.01) 2 else r.uintAtMost(usize, max_len);
const code = buf[0..len];
const alpha_only = r.float(f32) < 0.95;
if (alpha_only) {
for (code) |*c| {
c.* = r.uintAtMost(u8, 26);
c.* += if (r.boolean()) 'a' else 'A';
}
} else {
r.bytes(code);
}
_ = iso3166.Country.fromAlpha2StaticStringMap(code) catch {
num_invalid += 1;
};
}
std.debug.print("num_invalid: {}\n", .{num_invalid});
}
I’m sure there are ways to make this lookup even faster, but I hope it shows what I’m trying to get at.
There are also some fine data structures in the std.enum namespace, which many reading may not be aware of. None of them allocate, making them appropriate for comptime. I’ve found them quite useful a number of times.
I was curious to see how my static-map compared to your hand rolled lookup so I added a benchmark here. Looks like the hand-rolled lookup is about 25% faster in this benchmark which does 100_000 iterations searching for a random 2 letters.
This has the heavy caveat that it will panic/cause UB if val doesn’t have an associated enum field (and the @tagName will panic/cause UB if Map is non-exhaustive and there’s no associated enum field with that value).
My assumption would be that a function with a switch would be fairly performant for enum -> T conversion, but haven’t checked to see how it performs vs std.enums.EnumMap.
Ah thats true. I added a specialized, case insensitive map and modified the benchmark alphabet to include upper case letters and doubled that map’s capacity. Now i’m able to slightly edge out the hand-rolled version.
I wonder if the hand-rolled code would benefit from doing the lower casing as i’ve done (by or-ing with 32 as in key[0] | 32). not sure as i didn’t check. just speculating as to why mine might be slightly faster.
That range comparison will not return early for some characters outside [a-z][A-Z] – just a caveat, I did not read the details to find out if this might be a problem.