Minimum Viable Zig Regex

I do have some notion of trying a capture-group iterator. My plan gets a little tripped up on recursive captures, main reason I haven’t tried to make it work.

Given the interest, I’m more likely to take a shot at it when I find some free time.

2 Likes

This indeed should emit empty match, I mean there’s no match with mvzr.

1 Like

Ah yes I see what you mean.

v0.2.4 is up and fixes this. The logic wasn’t trying to match an empty string against the regex, which clearly it does need to do.

Thanks!

As a general note, let’s keep issues with mvzr on the issue tracking board, to keep the thread high-signal.

1 Like

Hi,

I am coming 5 months too late but I just found this showcase: I give it a try instead of PCRE2/POSIX regular expression libraries for some of my projects. With few tweaks on my patterns it works really well. Thank you for your work and for sharing it.

4 Likes

v0.2.5 is out. This is a bugfix release: ranges such as [\x00-\x08] are now parsed correctly.

4 Likes

We’ve reached v0.3.0!

This release improves multibyte handling. Codepoints are kept together, such that a regex like λ+ now matches "λλλ", and ranges and bytes expressed with \x syntax can cover the whole u8 range, not just the ASCII lower portion of it. With some care this allows for construction of useful character sets, so long as they happen to be dense in the Unicode codepoint spectrum.

You’ll need to do a deep dive on UTF-8 encoding to really make use of the high-byte character sets feature, but I expect this to substantively improve handling of, in particular, European languages which make use of accented Latin characters, Greek, or Cyrillic.

6 Likes