String Operations at comptime: convert @embedFile(windows_file) to unix_file

Hi,
I am currently converting “\r\n” from windows files to “\n” using std.mem.replacementSize, alloc and std.mem.replace at runtime. Not sure if it’s the best way but it works as a proof of concept.
However, those files being static, they could directly be embedded in the executable with @embedFile at compile time. I have been scratching my head for a while now trying to convert those strings and remove those “\r” at compile time but no luck so far.
So in desperation, I though I would ask the more experienced Ziggit community :slight_smile: Is it possible to do that without a comptime allocator? Should I instead write a function in my build.zig to pre-convert those files before @embedFile them?
Thank you for your help.

1 Like

Does this conversion happen before compiling? Then I guess using some sed or awk command would make more sense?

Hi @wlolwz
see \n vs \r in readUntilDelimiterOrEofAlloc - #2 by dimdin

1 Like

I ended up writing my function in build.zig and it works well that way.

1 Like

The general answer is:

  • Create an array with the required size or an upper bound of the size
  • Populate the array
  • Assign the populated array to a const, and return a reference to that (see Comptime-Mutable Memory Changes)

Here’s a simple version that just strips \r characters from a comptime-known string:

const std = @import("std");

fn crlfToLf(comptime str: []const u8) []const u8 {
    comptime {
        var buf: [str.len]u8 = undefined;
        var i: usize = 0;
        for (str) |c| {
            if (c == '\r') continue;
            buf[i] = c;
            i += 1;
        }
        // since `i` is comptime-known,
        // `buf[0..i]` results in a pointer-to-array,
        // so we dereference that to get a copy
        // of the array with the correct length
        const final = buf[0..i].*;
        return &final;
    }
}

const some_data = "abc\r\ndef\r\n";
const some_data_lf = crlfToLf(some_data);

pub fn main() !void {
    std.debug.print("before: {}\n", .{std.fmt.fmtSliceEscapeLower(some_data)});
    std.debug.print("after: {}\n", .{std.fmt.fmtSliceEscapeLower(some_data_lf)});
}

However, note that the above implementation loses the null-terminator and the “pointer-to-array-ness” of the type of string literals (and the type returned by @embedFile). The null-terminator is easy enough to add back in if you want, but if you also want to keep the “pointer-to-array-ness”, then one way to go is to also calculate the final length upfront and then use the calculated length in the return type. An example of that can be found in the standard library:

3 Likes

It might be easier to write a program that takes a file as input and outputs a file in the desired format. Then call that program from build.zig. Your actual code can just consume the final file, without having to transform it.

1 Like

This part of the build system documentation is a good example that demonstrates how that can be done: Zig Build System ⚡ Zig Programming Language

2 Likes

damn, I feel stupid now to not have been able to figure out those 10 lines of code :slight_smile:
Thanks a lot for going above and beyond my question with all those extra information on top of the example.

Indeed, that’s what I ended up doing with a pre_build.zig that is called during build, adding a few more formatting and find/replace along the way as well.

This is great, much better than what I came up with. I’ll have a deeper look into it.