BORZIG: a CBOR codec library

CBOR, the Concise Binary Object Representation is my favorite binary encoding. Similar to MessagePack, it’s smaller and faster than JSON, and has a well-defined spec: https://cbor.io/.

Source: https://codeberg.org/javier_guerra/borzig

So far, it’s not optimized for speed, but for ease of use. I iterated a few times with how to use it in both streaming and in-place decoding, but it has settled down for a while now.

Like other (de)serialisation libraries, it’s heavily comptime, making it easy to go between structs and bytestrings preserving type-safety.

Comments welcome!

7 Likes

Quick question @Javier: say I have an array of objects, where each object is of the form {"id": 11, "name": "gonzo"}. If I encode this object with CBOR, can I avoid repeating the key name strings ("id", "name") for each object in the array? I browsed around and could not find a quick answer.

Technically you could transform {"id": id, "name": name}, ... to [{"id":0, "name":1}, [11, "gonzo"], ...] before encoding and back after decoding.
Possibly adding some sort of marker to distinguish it from other values.

But from looking at the specs I can’t see any record-like type that allows specifying the keys only once for multiple instances. But the spec has some unused values so you could possibly create some sort of extension that adds something like that.

Looks nice! I’m a big fan of cbor myself and I use it in most of my projects.
Two features I suggest you add that I use a lot with my own cbor library:

  1. Streaming conversion to/from json. Great for tracing, logging and working with json apis.

  2. Pattern matching on the encoded cbor. Great for using cbor in messaging systems and protocols and often faster than fully decoding. This would be very similar to your StreamParser.parse, but instead of decoding the entire cbor stream to the target type you use a target value as a template to match and extract values against.

On a side note, stringify and parse are not terms I would use with cbor. It’s binary and in my mind those terms imply some sort of plain text. I would used terms like encode, decode, extract, match etc. Similar case for prefix, it’s called the data item head in the cbor rfc, or just the ‘type bytes’ and is not really a prefix because in many cases it may contain the entire data item.

Right, this is the usual trick for this sort of thing. Wondering if it was directly supported in CBOR. This is one of the things I like about ProtocolBuffers (but there are others I hate).

can I avoid repeating the key name strings ("id", "name") for each object in the array?

For decoding, it just works. Since Zig preserves struct field order, you can simply decode an (heterogeneous) array into a struct, and each array element will be decoded to the corresponding field.

For encoding, you could use a tuple, which is encoded into an array, or add use custom encoding (adding a .encode method to your struct). I might add some helper function to make that option easier.

[EDIT: just did that! custom “struct as array” encoder]

I’ve never used it, even on ProtocolBuffers, but I see how CBOR->JSON could be easy and efficient. I’ll add to the TODO.

For the other way around… It seems a much heavier task, and would either require random access to the JSON source (no streaming), or make every array and string in the resulting CBOR and indeterminate length element. While that’s supported, it negates the best advantages of binary encoding, and makes non-allocating decoding impossible.
Let’s say it’s much lower in the TODO list.

Yes, I do have some uses for that feature. Stay tuned!

I mostly agree. Will consider renaming.