How to provide assets (data) with a zig library?

Background: I want to be able to use zdt on Windows. Since Windows does not come with the IANA timezone database, I’ll need to provide it somehow together with the library.

My first thought was to embed the whole zoneinfo directory, containing the TZif files with the timezone rules (you can obtain/compile the latest version from here for example). However, there are some issues with this. For one, embedding files can be done at comptime, but iterating a directory needs to be done at runtime, i.e. in the build.zig (as I’ve tried here). And then this seems unnecessary in principle - if I have to ship the data with the library anyways, why the additional embed-step? Why embed the whole database if a user will just require a handful of timezones in the end?

So my question is: is there a way to provide assets (data) with a zig library? The additional constraint here being that a user should be able to selectively use parts of the data.

I would provide module/import that has comptime functions to select the data the user wants and it would still use @embedFile. This way you don’t limit yourself to target systems that have filesystem, and still give option for the user to write the data to filesystem and then load runtime if they want to do so.

right, that seems like the way to go for comptime-known input. Will give it a try.

But what about runtime-known input? For example from a web page form? This feels similar to parsing user input; if the input format is comptime-known, things get a lot simpler…

For runtime, you would have to provide API in the library itself, while it might not make network calls itself, it can still operate on data coming through the application level.

Won’t this make it hard(er) to update the tz data?

which step specifically?

keeping tzdata up-to-date is a bit of an issue; providing it via the library requires a library update each time there is a tzdata update. Not sure if that is the most satisfactory solution but I can see it working out.

Typically you would automate the updates

Right, this is why I think it would be better to store the tz data on separate files, which can be updated without having to update the code.

One can find memory efficient binary representation of the TZ data, create a single artifact that cover all TZs and memory map it at runtime. Access entries by indexing into mapped memory, for example using perfect hashing (hash function on known static set of strings that avoids hash collisions and holes).

1 Like

do you know of an implementation of something similar maybe?

Actually, using a hash map was my initial idea. However I didn’t manage to create that at comptime yet, based on the directory walk through tzdata.

btw. another issue is the application side I guess; if an application bakes in a library, and the library requires an update at some point, how to notify the creator of the application? In the case of tzdata, I think this is even a language-agnostic problem…

For one, embedding files can be done at comptime, but iterating a directory needs to be done at runtime, i.e. in the build.zig

Looking at what you have done in build.zig in your example, I think a better way might be to add the generated data as an explicit build step.

You can use b.addExecutable() to create an executable (this can be a zig source file) used at build time. Then b.addRunArtifact() to run the executable in the build graph. I do this in Ziglua to generate files at build time, so take a look there for more info.

With this approach you could combine all of the zoneinfo data (and provide compile options to omit data) into a single file which can then be embedded.

is there a way to provide assets (data) with a zig library? The additional constraint here being that a user should be able to selectively use parts of the data.

From your original question and the following discussion I’m still not perfectly clear on your goal. Are you trying to make some files available alongside your “zdt” module to be used at runtime? Or do you want this data bundled inside your module?

1 Like

thanks for pointing me to the ziglua example, I’ll have to look into that in more detail.

“naturally” you would have one file per time zone. That’s how the IANA tz db is organized (file format is TZif), and that’s what your typical Linux machine is using; the tzif files live for example in /usr/share/zoneinfo. As for libraries that provide datetime with timezone support, I’m not sure if it’s optimal to ship a directory with a bunch of files. I think go ships with a zip file; not sure how Rust / chrono-tz does it, seems they import Paul Eggert’s tz db as a submodule and go on from there.

As for libraries that provide datetime with timezone support, I’m not sure if it’s optimal to ship a directory with a bunch of files. I think go ships with a zip file; not sure how Rust / chrono-tz does it, seems they import Paul Eggert’s tz db as a submodule and go on from there.

Thanks for the clarifications, this helps a lot.

I guess your approach just depends on what you want to do. If you expect end users of zdt to distribute the timezone data alongside their executables, then they could do something like this

const zdt = b.dependency("zdt", .{});
b.installDirectory(.{
    .source_dir = zdt.path("timedata/"),
    .install_dir = .prefix,
    .install_subdir = "timedata",
});

And then zdt could look for the data at runtime at some configured path. That seems a bit messy, but it would work. Or you can do something similar, but with a generated file (maybe combine the files into a .zip at build time and expose that as a module) and then the users of zdt can make sure that file is installed alongside their executables.

Or you can embed the data (in whatever format you choose) into your module doing something like I (and others) have already suggested above. Maybe this could even be only available if the target is Windows?

I guess there are a lot of approaches… and I’m not sure on the best one. Hopefully one of those ideas help!

3 Likes

For example GNU gperf gperf - GNU Project - Free Software Foundation (FSF)
It generates perfect hashing function (no collisions) in C for a fixed set of string. Hopefully, one can use translate-C on it to produce zig equivalent or port the whole gperf to Zig (zperf anyone :slight_smile: ). the point being to precompute, pre-hash as much as possible so that at run-time only do lookup. BTW, when updating TZ data if keys change one need to regenerate perfect hash-fn.

1 Like

Another approach can be storing TZ data in sqlite.

1 Like

I was going to suggest exactly this. This would be my approach.

Million dollar question – what approach will @FObersteiner choose? :slight_smile:

1 Like

haha I’ll be happy already if I get this working in principle ^^ At the moment, the TZif parser is giving me trouble again, after switching from my system’s zoneinfo to the self-compiled version… more quirks to discover there first.

Thanks for all the input!

1 Like

I dislike the idea of having a sqlite dependency just because I wanted to use some timezones or do some time calculations. I think if you have no other pre existing reason to use sqlite, it is too big of a dependency when it isn’t really needed at all.

Also from my limited experience (admittedly from languages that aren’t fast themselves), sqlite can be quite slow sometimes, I think just compiling it in, or reading from a file should be easier to make fast and have less overhead.