This question has a certain relationship to the other question I asked. (No need to read! Totally different question)
But I already figured out many things since then because of the immense help I’ve gotten from all of you, for which I’m super grateful. If this question is too stupid to ask, or I’m getting too spammy, please feel free to remove it!
I haven’t had much time these days. I only spend like 1-2 hours, not every day, to learn anything new after my first question.
Here are my two files that I need to show:
common.zig
(This code works, but it’s wild how it’s written)
const latinizer = @import("converters/latinizer.zig");
const std = @import("std");
// All here's for the learning purposes
pub fn getCyrillicToLatinicHashedMap() ![1102][2]u8 {
var hashed: [1102][2]u8 = undefined;
var iterator = std.unicode.Utf8Iterator{ .bytes = latinizer.cyrillic_to_latinic, .i = 0 };
var map_iterator = std.unicode.Utf8Iterator{ .bytes = latinizer.cyrillic_to_latinic_map, .i = 0 };
while (iterator.nextCodepoint()) |char| {
_ = try std.unicode.utf8Encode(map_iterator.nextCodepoint() orelse unreachable, &hashed[char]);
std.debug.print("{} - {s}\n", .{ char, hashed[char] });
}
return hashed;
}
main.zig
(The project structure hasn’t been decided yet, I’m still learning the basics)
const std = @import("std");
const transliterator = @import("lib.zig");
pub fn main() !void {
const cyrl_to_lat_hashed_map = try transliterator.common.getCyrillicToLatinicHashedMap();
const stdin_reader = std.io.getStdIn().reader();
var buf: [1024]u8 = undefined;
std.debug.print("Input cyrilic text: ", .{});
_ = try stdin_reader.readUntilDelimiterOrEof(&buf, '\n');
std.debug.print("\n{s}", .{buf});
var buf_copy: [1024]u8 = undefined;
// Don't judge me pwease. I don't know enough of Zig yet, okay?
for (buf, 0..) |value, i| {
buf_copy[i] = value;
}
var buf_iter = std.unicode.Utf8Iterator{ .bytes = &buf_copy, .i = 0 };
var i: usize = 0;
while (buf_iter.nextCodepoint()) |char| {
var map_iter = std.unicode.Utf8Iterator{ .bytes = &cyrl_to_lat_hashed_map[char], .i = 0 };
while (map_iter.nextCodepoint()) |map_char| { // The issue is here
i += try std.unicode.utf8Encode(map_char, buf[i..]);
break;
}
// if (char == undefined) break;
}
std.debug.print("Result: {s}\n", .{buf});
}
The project compiles well, but here’s roughly what I get:
...
1093 - x�
1098 - ʼ // This okina needs two bytes to encode
1099 - �
1100 - �
1101 - e�
Input cyrilic text: Хатоликлар // My input
...
thread 359846 panic: attempt to unwrap error: Utf8InvalidStartByte
/usr/lib/zig/std/unicode.zig:28:5: 0x1069578 in utf8ByteSequenceLength (zigxatolik)
return switch (first_byte) {
^
/usr/lib/zig/std/unicode.zig:402:69: 0x1039ca0 in nextCodepointSlice (zigxatolik)
const cp_len = utf8ByteSequenceLength(it.bytes[it.i]) catch unreachable;
^
/usr/lib/zig/std/unicode.zig:408:44: 0x1035957 in nextCodepoint (zigxatolik)
const slice = it.nextCodepointSlice() orelse return null;
^
/home/sohro/projects/zigxatolik/src/main.zig:23:38: 0x10363e8 in main (zigxatolik)
while (map_iter.nextCodepoint()) |map_char| {
^
/usr/lib/zig/std/start.zig:524:37: 0x1035545 in posixCallMainAndExit (zigxatolik)
const result = root.main() catch |err| {
^
/usr/lib/zig/std/start.zig:266:5: 0x1035061 in _start (zigxatolik)
asm volatile (switch (native_arch) {
^
???:?:?: 0x0 in ??? (???)
Is it possible to handle the unreachable that the function is causing in the header of the while loop? I tried orelse break
and while () {} else break;
, but I don’t have enough understanding of optionals, or any knowledge that is needed here yet.
while (map_iter.nextCodepoint()) |map_char| { // The issue is here