Tabs in String literals - how to?

msw · January 30, 2025, 9:13pm

How to do this?

//tabs.zig
const std = @import("std");
const t1 = "a\tb";
const t2 = "aTABb";
const t3 =
    \\aTABb
;
pub fn main() void {
    std.debug.print("a\tb,aTABb,{s},{s},{s}\n", .{ t1, t2, t3 });
}

zig run tabs.zig
a	b,aTABb,a	b,aTABb,aTABb

sed -ie "s,TAB,\t,g" tabs.zig
zig run tabs.zig
tabs.zig:3:12: error: expected expression, found 'invalid token'
const t2 = "a b";
           ^~~~~~

AndrewKraevskii · January 30, 2025, 9:24pm

Zig doesn’t allow you to use tab symbol FAQ · ziglang/zig Wiki · GitHub
Use \t if you need tab.

msw · January 30, 2025, 9:25pm

sibling thread refers to the underlying issue: zig really dislikes tabs. Problem: POSIX really likes tabs. Especially in long, TSV data. Which you might want to build long, multi line strings for.
Now instead, use a collection of string fragment files to @embedFile, I suppose?
Horrible UX if you need to deal with tabs IMO.

msw · January 30, 2025, 9:26pm

That’s not good enough, yet. expectEqualStrings doesn’t deal with the escaped inputs properly.

First difference occurs on line 3:
expected:
MyEnum\ta.zig\t/^const MyEnum = enum {$/;"\tenum
      ^ ('\x5c')
found:
MyEnum	a.zig	/^const MyEnum = enum {$/;"	enum
      ^ ('\x09')
3/5 Tags.test.Tags.findTags...FAIL (TestExpectedEqual)

NOTE/EDIT: I was wrong and oversaw that multi-line strings do not support escape sequences. expectEqualStrings does not have to deal with the escape sequences as the compiler will replace the string literal’s escape sequences with the correct byte values.

AndrewKraevskii · January 30, 2025, 9:27pm

can you give full code sample?

msw · January 30, 2025, 9:28pm

Here. Get ztags, patch it, run zig test src/Tags.zig. Maybe patch it better?
Nah, as soon as you’re capturing an output of a writer which does support escape sequences and you want to compare that with expectEqualStrings, you will compare the escape sequence vs. the actual replaced data, which will fail. \n vs. 0xa, \t vs. 0x9.

Also, I suppose if zig fmt insists on replacing tabs inside values, it should replace it by the escape sequence and not by a random amount of spaces. But that then means for more complicated tab-enabled inputs, you can’t properly program bulk-comparison to input texts.

ISTM there’s a valid use-case oversight here.

msw · January 30, 2025, 9:51pm

would one comptime build a string with \ts and concatenation, or printing, so the correct byte values would be in the resulting runtime string?

AndrewKraevskii · January 30, 2025, 9:52pm

Here is patch what passes zig test src/Tags.zig

new_diff.patch.txt (5.8 KB)

Basicaly you would use

const string = "here goes tab\tand linebreak\n" ++
"next line\n";

instead of

const string = 
    \\here goes tab\tand linebreak
    \\next line
    \\
    ;

Here are some other ways to do it: Make trailing whitespace at the end of multiline strings an error · Issue #19299 · ziglang/zig · GitHub.

It is highly possible for multiline strings to be restricted even more. Make trailing whitespace at the end of multiline strings an error · Issue #19299 · ziglang/zig · GitHub

msw · January 30, 2025, 10:01pm

Thanks. Obviously multi-line string literals do not have escapes as per the docs. My first patch did the same thing, but I forgot about ++ and ended up with a very unwieldy string literal, hence my (incorrect) 2nd patch I shared.

Ok, that’s close to multi-line string literals, WRT literals inputs in your program file.

Given that the string literal supports escape sequences, expectEqualStrings will only ever see the correct byte values.

As such, I only see the ‘minor’ inconvenience of “it renders differently” remaining, which is the argument for disallowing tabs in places in zig… it still feels weird to me, but there’s nothing to argue here for me.

msw · January 30, 2025, 10:07pm

Where does this missionary attitude come from? There must be a real strong conviction that the valid use-cases for arbitrary string literal content do not outweigh inadvertent mis-formatting.

AndrewKraevskii · January 30, 2025, 10:20pm

Well I checked couple of projects with multiline string literals and some of them have trailing spaces it includes some of my projects and zig compiler itself. So its pretty common to make such mistake.

Here is command to find them. Crazy how many escapes you need to get \\

rg "\\\\\\\\.+\s+$"

msw · January 30, 2025, 10:23pm

rg '\\\\.+\s+$' cuts half the \s.

AndrewKraevskii · January 30, 2025, 10:23pm

what? ’ and " are different in bash?

ianprime0509 · January 31, 2025, 3:53am

If you prefer the style of multiline strings, you can also use comptime to write a wrapper function to implement escapes. For example, here’s a very simple one (with poor error reporting) which handles \t and \\:

const std = @import("std");

inline fn tabs(comptime s: []const u8) []const u8 {
    var res: []const u8 = "";
    var pos: usize = 0;
    while (std.mem.indexOfScalarPos(u8, s, pos, '\\')) |i| {
        if (i + 1 == s.len) @compileError("trailing \\");
        res = res ++ s[pos..i];
        switch (s[i + 1]) {
            't' => res = res ++ "\t",
            '\\' => res = res ++ "\\",
            else => @compileError("invalid escape"),
        }
        pos = i + 2;
    }
    return res ++ s[pos..];
}

const example = tabs(
    \\hello\tworld
    \\backslash\t\\
    \\
);

test "tabs" {
    try std.testing.expectEqualStrings("hello\tworld\nbackslash\t\\\n", example);
}

You could even use the utilities in std.zig.string_literal if you want to completely recreate all of Zig’s built-in string escapes.

Also, if you’re working with TSV data (as you mentioned earlier), you could choose another delimiter (such as |) to replace with tabs, or perhaps better yet, write a helper to allow you to structure your raw data and format it as TSV at comptime:

const my_data = tsv(.{
    .{"one", "two"},
    .{"three", "four"},
});

The possibilities are endless (and each of these possibilities avoids the potential confusion that could be caused by embedding raw tab characters)

Edit: and one more option (potentially better if you have a large amount of data) is to use @embedFile to embed a raw TSV file (which may contain any content) into the binary.