sibling thread refers to the underlying issue: zig really dislikes tabs. Problem: POSIX really likes tabs. Especially in long, TSV data. Which you might want to build long, multi line strings for.
Now instead, use a collection of string fragment files to @embedFile, I suppose?
Horrible UX if you need to deal with tabs IMO.
Thatās not good enough, yet. expectEqualStrings doesnāt deal with the escaped inputs properly.
First difference occurs on line 3:
expected:
MyEnum\ta.zig\t/^const MyEnum = enum {$/;"\tenum
^ ('\x5c')
found:
MyEnum a.zig /^const MyEnum = enum {$/;" enum
^ ('\x09')
3/5 Tags.test.Tags.findTags...FAIL (TestExpectedEqual)
NOTE/EDIT: I was wrong and oversaw that multi-line strings do not support escape sequences. expectEqualStrings does not have to deal with the escape sequences as the compiler will replace the string literalās escape sequences with the correct byte values.
Here. Get ztags, patch it, run zig test src/Tags.zig. Maybe patch it better?
Nah, as soon as youāre capturing an output of a writer which does support escape sequences and you want to compare that with expectEqualStrings, you will compare the escape sequence vs. the actual replaced data, which will fail. \n vs. 0xa, \t vs. 0x9.
Also, I suppose if zig fmt insists on replacing tabs inside values, it should replace it by the escape sequence and not by a random amount of spaces. But that then means for more complicated tab-enabled inputs, you canāt properly program bulk-comparison to input texts.
Thanks. Obviously multi-line string literals do not have escapes as per the docs. My first patch did the same thing, but I forgot about ++ and ended up with a very unwieldy string literal, hence my (incorrect) 2nd patch I shared.
Ok, thatās close to multi-line string literals, WRT literals inputs in your program file.
Given that the string literal supports escape sequences, expectEqualStrings will only ever see the correct byte values.
As such, I only see the āminorā inconvenience of āit renders differentlyā remaining, which is the argument for disallowing tabs in places in zigā¦ it still feels weird to me, but thereās nothing to argue here for me.
Where does this missionary attitude come from? There must be a real strong conviction that the valid use-cases for arbitrary string literal content do not outweigh inadvertent mis-formatting.
Well I checked couple of projects with multiline string literals and some of them have trailing spaces it includes some of my projects and zig compiler itself. So its pretty common to make such mistake.
Here is command to find them. Crazy how many escapes you need to get \\
If you prefer the style of multiline strings, you can also use comptime to write a wrapper function to implement escapes. For example, hereās a very simple one (with poor error reporting) which handles \t and \\:
const std = @import("std");
inline fn tabs(comptime s: []const u8) []const u8 {
var res: []const u8 = "";
var pos: usize = 0;
while (std.mem.indexOfScalarPos(u8, s, pos, '\\')) |i| {
if (i + 1 == s.len) @compileError("trailing \\");
res = res ++ s[pos..i];
switch (s[i + 1]) {
't' => res = res ++ "\t",
'\\' => res = res ++ "\\",
else => @compileError("invalid escape"),
}
pos = i + 2;
}
return res ++ s[pos..];
}
const example = tabs(
\\hello\tworld
\\backslash\t\\
\\
);
test "tabs" {
try std.testing.expectEqualStrings("hello\tworld\nbackslash\t\\\n", example);
}
You could even use the utilities in std.zig.string_literal if you want to completely recreate all of Zigās built-in string escapes.
Also, if youāre working with TSV data (as you mentioned earlier), you could choose another delimiter (such as |) to replace with tabs, or perhaps better yet, write a helper to allow you to structure your raw data and format it as TSV at comptime:
The possibilities are endless (and each of these possibilities avoids the potential confusion that could be caused by embedding raw tab characters)
Edit: and one more option (potentially better if you have a large amount of data) is to use @embedFile to embed a raw TSV file (which may contain any content) into the binary.