Hello again, I’m back again with more updates
I just did a big update to fluent implementing comptime regular expressions and our first iterator that uses them - MatchIterator
.
Edit: More regex utilities have been added since this first post such as split
and fluent string algorithms.
Regular expressions follow the PCRE
syntax. This update has a full regex parser and finite-state-automaton generator that builds parsing trees at comptime. All parsing trees are stateless and have @sizeOf(T) == 0
.
To use Regex, just look for the fluent.match
function and provide your expression string and source string: fluent.match("[abc]\\d+", str)
It’s an on going process, but it’s at a point where people can start using them and be on the lookout for more updates! So far, here’s what’s been implemented:
Special Characters:
\d - digits
\D - no-digits
\w - alphanumeric
\W - no-alphanumeric
\s - whitespace
\S - no-whitespace
. - any character
^ - starts with
$ - ends with
Quantifiers:
+ - one or more
* - any quantity
? - none or one
{n} - exactly n
{m,n} - between m and n (inclusive)
Operators:
| - alternation (or clause)
() - capture group
[] - character set
[^] - negated character set
[a-z] - character spans
(?=) - positive look ahead
(?!) - negative look ahead
Examples
{ // match special characters (typical) - one or more
var itr = match("\\d+", "123a456");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "123");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "456");
try std.testing.expect(itr.next() == null);
}
{ // match special characters (typical) - exact
var itr = match("\\d{3}", "123456");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "123");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "456");
try std.testing.expect(itr.next() == null);
}
{ // match special characters (typical) - between
var itr = match("\\d{3,4}", "123456");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "1234");
try std.testing.expect(itr.next() == null);
}
{ // match special characters (inverse)
var itr = match("\\D+", "123a456");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "a");
try std.testing.expect(itr.next() == null);
}
{ // pipe-or clauses
var itr = match("abc|def", "_abc_def_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "abc");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "def");
try std.testing.expect(itr.next() == null);
}
{
var itr = match("(a+bc)+", "_aaabc_abcabc_bc_abc_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "aaabc");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "abcabc");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "abc");
try std.testing.expect(itr.next() == null);
}
{ // character sets (typical)
var itr = match("[a1]+", "_a112_21aa112_a_1_x_2");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "a11");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "1aa11");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "a");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "1");
try std.testing.expect(itr.next() == null);
}
{ // character sets (negated)
var itr = match("[^a1]+", "_a112_21aa112_a_1_x_2");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "2_2");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "2_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_x_2");
try std.testing.expect(itr.next() == null);
}
{ // character sets (negated)
var itr = match("[^\\d]+", "_a112_21aa112_a_1_x_2");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_a");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "aa");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_a_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "_x_");
try std.testing.expect(itr.next() == null);
}
{ // character sets (compound)
var itr = match("[abc]\\d+", "_ab112_c987b123_d16_");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "b112");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "c987");
try std.testing.expectEqualSlices(u8, itr.next() orelse unreachable, "b123");
try std.testing.expect(itr.next() == null);
}
Edit: split and match iterators have been added and string backend no supports regex.
There’s also many optimizations that are planned too. One example is branch-fusion where we combine multiple branches and/or remove any redundant branches. We also need to do performance testing and work with the generated assembly to get it exactly like we want. All in good time…
Thanks