Ezicode — a Unicode library for Zig

ezi-code — a Unicode library for Zig


Released v0.1.0 of ezi-code, a Unicode library for Zig. No dependencies. UCD tables are generated into Zig source and committed. https://github.com/shaik-abdul-thouhid/ezi-code

Three layers:

  • encoding — UTF-8, UTF-16, UTF-32 codecs. Strict / unchecked / lossy decode flavours.
  • transcoding — cross-encoding converters and a chunked UTF-8 stream decoder.
  • unicode — properties and algorithms backed by the UCD.

The unicode layer covers:

  • General Category / Bidi Class / CCC / Derived Core Properties
  • casing (incl. Turkic)
  • normalization (NFC/NFD/NFKC/NFKD + streaming Normalizer + Quick_Check)
  • segmentation (grapheme / word / sentence / line — UAX #14 and #29)
  • East Asian Width
  • Script + Script_Extensions, full UAX #9 bidi including reordering
  • Numeric_Type/Value
  • Blocks
  • Hangul (with algorithmic composition)
  • Derived Age (just tags which version the codepoint was released)

Conformance: tested against GraphemeBreakTest.txt, WordBreakTest.txt, SentenceBreakTest.txt, LineBreakTest.txt, and NormalizationTest.txt under a build flag. Bidi has the rule-numbered adversarial suite for UAX #9.

Tried to cover complete UCD for entire text processing, CLDR is deliberately not covered in this.

Supported Zig versions

Since the release for 0.17 is right around the corner, this is built by chasing master branch, builds correctly on 0.17.0-dev.607+456b2ec07

AI / LLM usage disclosure

All the tests are generated by llm. And more than half of the generator code and lookups is/are generated by llm.

1 Like