Porting handwritten JavaScript to Zig: Native Messaging host

I posted the Zig version of a Native Messaging host here that users flagged and moderators removed due to the “generated” code being from Google Gemini.

I’ll try again.

This is the original, handwritten JavaScript (originally written and tested in QuickJS JavaScript engine/runtime) Native Messaging host that I have used online converters to port to Zig, Rust, Lua, Ruby and several other programming languages NativeMessagingHosts/nm_qjs_64.js at main · guest271314/NativeMessagingHosts · GitHub

#!/usr/bin/env -S /home/user/bin/qjs -m --std
// QuickJS Native Messaging host
// Original hand written 64 MiB parsing/processing implementation
// guest271314, 5-6-2022

function getMessage() {
  const header = new Uint32Array(1);
  std.in.read(header.buffer, 0, 4);
  const output = new Uint8Array(header[0]);
  const len = std.in.read(output.buffer, 0, output.length);
  return output;
}
function sendMessage(message) {
  if (message.length > 1024 ** 2) {
    const json = message;
    const data = new Array;
    let fromIndex = 1024 ** 2 - 8;
    let index = 0;
    let i = 0;
    do {
      i = json.indexOf(44, fromIndex);
      const arr = json.subarray(index, i);
      data.push(arr);
      index = i;
      fromIndex += 1024 ** 2 - 8;
    } while (fromIndex < json.length);
    if (index < json.length) {
      data.push(json.subarray(index));
    }
    for (let j = 0;j < data.length; j++) {
      const start = data[j][0];
      const end = data[j][data[j].length - 1];
      if (start === 91 && end !== 44 && end !== 93) {
        const x = new Uint8Array(data[j].length + 1);
        for (let i2 = 0;i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        x[x.length - 1] = 93;
        data[j] = x;
      }
      if (start === 44 && end !== 93) {
        const x = new Uint8Array(data[j].length + 1);
        x[0] = 91;
        for (let i2 = 1;i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        x[x.length - 1] = 93;
        data[j] = x;
      }
      if (start === 44 && end === 93) {
        const x = new Uint8Array(data[j].length);
        x[0] = 91;
        for (let i2 = 1;i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        data[j] = x;
      }
    }
    for (let k = 0;k < data.length; k++) {
      const arr = data[k];
      const header = Uint32Array.from({
        length: 4
      }, (_, index2) => arr.length >> index2 * 8 & 255);
      const output = new Uint8Array(header.length + arr.length);
      output.set(header, 0);
      output.set(arr, 4);
      std.out.write(output.buffer, 0, output.length);
      std.out.flush();
    }
  } else {
    const header = Uint32Array.from({
      length: 4
    }, (_, index) => message.length >> index * 8 & 255);
    const output = new Uint8Array(header.length + message.length);
    output.set(header, 0);
    output.set(message, 4);
    std.out.write(output.buffer, 0, output.length);
    std.out.flush();
  }
}
function main() {
  while (true) {
    const message = getMessage();
    sendMessage(message);
  }
}
try {
  main();
} catch (e) {
  std.writeFile("err.txt", e.message);
  std.exit(0);
}

Here’s the protocol

Chrome starts each native messaging host in a separate process and communicates with it using standard input (stdin ) and standard output (stdout ). The same format is used to send messages in both directions; each message is serialized using JSON, UTF-8 encoded and is preceded with 32-bit message length in native byte order. The maximum size of a single message from the native messaging host is 1 MB, mainly to protect Chrome from misbehaving native applications. The maximum size of the message sent to the native messaging host is 64 MiB.

Note that the protocol currently uses JSON. That means that when processes the maximum a client can send to a host (64 MiB) only 1 MiB can be sent back to the client from the host per message. For simplicity and versatility I’m only dealing with JSON Arrays, which are closest to Uint8Array (u8), which can basically contain any kind of data, from live streaming audio and video to QUIC streams, etc.

Assuming I have only the experience of trying to port that JavaScript code to Zig using online converters, how would you go about porting the code to Zig for cross-target purposes?

I am a human asking other humans for help here. Thanks.

If you mean, “what would a good zig version of getMessage() look like, for instance,” then I think it would make sense for you to try one, see if it works as expected (say, in a test), and post with any trouble you encounter. I’d start with getMessage() because it’s nice and short and simple. That doesn’t mean it doesn’t take a little effort wrt/ how to process stdin, for instance. sendMessage() looks to me like it has a lot of room for a much more elegant zig solution, so that might be fun, when you get to it.

If you mean, otoh, that you need help with something else related to “cross-target purposes”, it might help to know a little more about what you’re looking for. Specifying appropriate compile targets? Dealing with WASM perhaps?

I’m asking where to start looking for an example of reading STDIN in Zig stable? There was a big difference between the 0.15.2 version the computer program spit out and the 0.16.0 version the computer program spit out - based on my original hand written JavaScript source; which by the way is challenging to write targeting multiple JavaScript engines, runtime, interpreters - because there’s no specified I/O at all in ECMA-262 and engines can pass test262 just fine without the ability to read STDIN or write to STDOUT; and each engine or runtime does it differently!

What I’m doing is exploring a few things at once. I started with JavaScript and have now implemented the algorithm in around 20 or so different programming languages, created benchmarks therefor - and where applicable, yes, compiled the same code to WASM and benchmarked those binaries being executed too.

The idea being that any given algorithm can be cross-compiled and ported to any language. Then the only metric that matters is time - the fastest empirically “wins”.

I’m also testing the claim that a historical figure “transliterated” symbols for which they had no verification of their guesses by humans fluent in the symbols. I’ll leave that part of the exercise at that.

Let’s start with the main function. We will get an allocator and Io instance from the main function.

pub fn main(init: std.process.Init) !void {
    const alloc = init.gpa;
    const io = init.io;
    
    // ...
}

We can get stdin with

const stdin = std.Io.File.stdin();

To read from it, we need to create a reader. This requires us to pass a buffer.

var stdin_buffer: [4096]u8 = undefined;
var stdin_reader = stdin.reader(io, &stdin_buffer);

Likewise, we get stdout and create a writer.

const stdout = std.Io.File.stdout();
var stdout_buffer: [4096]u8 = undefined;
var stdout_writer = stdout.writer(io, &stdout_buffer);

getMessage will need access to the stdin reader interface. It will return a slice of bytes. Since messages can have different sizes, we need to allocate these bytes on the heap, so we also need to pass an allocator. So the signature will be:

fn getMessage(reader: *std.Io.Reader, alloc: std.mem.Allocator) ![]u8 {
    // ...
}

sendMessage will obviously need the message to send. It also needs the stdout writer.

fn sendMessage(writer: *std.Io.Writer, message: []const u8) !void {
    // ...
}

Now that we know how the functions look like, we can call them from main.

pub fn main(init: std.process.Init) !void {
    const alloc = init.gpa;
    // (...)
    var stdout_writer = stdout.writer(io, &stdout_buffer);
    
    while (true) {
        const message = try getMessage(&stdin_reader.interface, alloc);
        defer alloc.free(message);
        try sendMessage(&stdout_writer.interface, message);
    }
}

Note that we must free the returned slice from getMessage.

getMessage is the simplest, so let’s start with that. We read a header of 4 bytes containing only the size of the message in native endianness. We can do that with:

fn getMessage(reader: *std.Io.Reader, alloc: std.mem.Allocator) ![]u8 {
    const header = try reader.takeInt(u32, .native);
    // ...
}

We then need to allocate a buffer of that size and read the message from stdin into that buffer.

const output = try alloc.alloc(u8, header);
try reader.readSliceAll(output);
return output;

Now, we continue to sendMessage. I found the part for long messages quite confusing, so I’ll start with the part for short messages.
There we construct a header that contains the length of the message. First of all, this encodes the length as little endian, but your protocol says it should be native endianness. Futhermore, you create an array of 4 Uint32 values, which is not needed. In zig, we can write it as:

fn sendMessage(writer: *std.Io.Writer, message: []const u8) !void {
    if (message.len > 1024 * 1024) {
        // ...
    } else {
        try writer.writeInt(u32, @intCast(message.len), .little); // maybe .native?
        try writer.writeAll(message);
    }
    try writer.flush();
}

It took me some time to understand what’s happening for long messages (like how 44, 91 and 93 refer to ascii characters). It’s possible that I misunderstood what it does, but my understanding is that it splits the data at a comma, modifies the start and end of the data, and sends it.
You create a list of those parts, but I think it’s more efficient to do the modification and sending directly when splitting.

To prevent the need to copy the parts to a new array for modification, I created an utility function that writes an message, but optionally with a character to prepended and appended.

fn writePart(writer: *std.Io.Writer, message: []const u8, prepend: ?u8, append: ?u8) !void {
    var length: usize = 0;
    if (prepend != null) {
        length += 1;
    }
    if (append != null) {
        length += 1;
    }
    
    try writer.writeInt(u32, @intCast(length), .little); // maybe .native?
    
    if (prepend) |char| {
        try writer.writeByte(char);
    }
    try writer.writeAll(message);
    if (append) |char| {
        try writer.writeByte(char);
    }
}

The splitting part starts similar to the Javascript code:

var from_index: usize = 1024 * 1024 - 8;
var index: usize = 0;

while (from_index < message.len) {
    // ...
}

Then we select the first comma after from_index and get the part that we split.

const i = std.mem.findScalarPos(u8, message, from_index, ',') orelse message.len;
const part = message[index .. i];
index = i;
from_index += 1024 * 1024 - 8;

Then the only thing that’s needed is the modification of the parts and sending them. We can do that in one step.

const start = part[0];
const end = part[part.len - 1];

if (start == '[' and end != ',' and end != ']') {
    // append a ]
    try writePart(writer, part, null, ']');
} else if (start == ',' and end != ']') {
    // replace first char by [ and append ]
    try writePart(writer, part[1..], '[', ']');
} else if (start == ',' and end == ']') {
    // replace first char by [
    try writePart(writer, part[1..], '[', null);
}
Full code
const std = @import("std");

fn getMessage(reader: *std.Io.Reader, alloc: std.mem.Allocator) ![]u8 {
    const header = try reader.takeInt(u32, .native);
    const output = try alloc.alloc(u8, header);
    try reader.readSliceAll(output);
    return output;
}

fn sendMessage(writer: *std.Io.Writer, message: []const u8) !void {
    if (message.len > 1024 * 1024) {
        var from_index: usize = 1024 * 1024 - 8;
        var index: usize = 0;
        
        while (from_index < message.len) {
            const i = std.mem.findScalarPos(u8, message, from_index, ',') orelse message.len;
            const part = message[index .. i];
            index = i;
            from_index += 1024 * 1024 - 8;
            
            const start = part[0];
            const end = part[part.len - 1];

            if (start == '[' and end != ',' and end != ']') {
                try writePart(writer, part, null, ']');
            } else if (start == ',' and end != ']') {
                try writePart(writer, part[1..], '[', ']');
            } else if (start == ',' and end == ']') {
                try writePart(writer, part[1..], '[', null);
            }
        }
        
    } else {
        try writer.writeInt(u32, @intCast(message.len), .little); // maybe .native?
        try writer.writeAll(message);
    }
    try writer.flush();
}

fn writePart(writer: *std.Io.Writer, message: []const u8, prepend: ?u8, append: ?u8) !void {
    var length: usize = 0;
    if (prepend != null) {
        length += 1;
    }
    if (append != null) {
        length += 1;
    }
    
    try writer.writeInt(u32, @intCast(length), .little); // maybe .native?
    
    if (prepend) |char| {
        try writer.writeByte(char);
    }
    try writer.writeAll(message);
    if (append) |char| {
        try writer.writeByte(char);
    }
}

pub fn main(init: std.process.Init) !void {
    const alloc = init.gpa;
    const io = init.io;
    
    const stdin = std.Io.File.stdin();
    var stdin_buffer: [4096]u8 = undefined;
    var stdin_reader = stdin.reader(io, &stdin_buffer);
    
    const stdout = std.Io.File.stdout();
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = stdout.writer(io, &stdout_buffer);
    
    while (true) {
        const message = try getMessage(&stdin_reader.interface, alloc);
        defer alloc.free(message);
        try sendMessage(&stdout_writer.interface, message);
    }
}

Note that I have not tested this code, so it may contain small mistakes.
If you have any questions, let me know.

4 Likes

Works for the 1 MiB case. I test Native Messaging hosts outside of the browser, on the command line with this NativeMessagingHosts/nm_standalone_test.js at main · guest271314/NativeMessagingHosts · GitHub

zig build-exe nm_zig_ziggit.zig -O ReleaseSmall
~/bin/nm_standalone_test.js ./nm_zig_ziggit
{ path: "./nm_zig_ziggit", allowed_origin: undefined }
Testing ./nm_zig_ziggit Native Messaging host

{ messageLength: 1048576 }
{
  message: [
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    null, null, null, null, null, null, null, null, null, null,
    ... 209615 more items
  ]
}
{ messageLength: 6 }
{ message: "test" }
{ messageLength: 2 }
{ message: "" }
{ messageLength: 1 }
{ message: 1 }
{ messageLength: 8 }
{ message: { "0": 97 } }

Adjust the native host manifest to point to that executable

{
  "name": "nm_zig",
  "description": "Zig Native Messaging host",
  "path": "/home/user/native-messaging-zig/nm_zig_ziggit",
  "type": "stdio",
  "allowed_origins": [
    "..."
  ]
}

Does not work for the 64 MiB case. Tested on Chromium Version 151.0.7888.0 (Developer Build) (64-bit) x86_64 Linux in DevTools

var data = Array(209715*64);
var len = data.length;
var n = 0;
var port = chrome.runtime.connectNative("nm_zig");
port.onMessage.addListener((message) => {
  n += message.length;
  if (n === len) {
    console.log({n, len});
    port.disconnect();
  }
});
port.onDisconnect.addListener((_) => {
  console.log("Disconnected");
  if (chrome.runtime.lastError) {
    console.log(chrome.runtime.lastError);
  }
});
port.postMessage(data);
data.length = 0;
0
tab.html:1 The sender sent an invalid JSON message; message ignored.

Keep in mind, currently the protocol is JSON. So we have to send back valid JSON. In this case that means valid JSON Array, at 1 MiB or less. For an generic Array created in JavaScript Array(209715) as JSON has length 1024**2, e.g., JSON.stringify(Array(209715)).length, 1048576, because the JSON Array is filled with null for uninitialized values. We have to send back valid JSON having a string (or string encoded as u8 type, that decoded to valid JSON) length 1048576 or less.

If Zig has something like ECMA-262 Array.prototype.slice() we can just do something like “for i of array slice 209715”.

Where is char coming from in writePart?

I did find some mistakes in my code.
First of all, when checking for the start en end characters of the part, an else clause is missing:

} else if (start == ',' and end == ']') {
    try writePart(writer, part[1..], '[', null);
} else {
    // Don't modify the message
    try writePart(writer, part, null, null);
}

And in writePart, the length of the message itself was not added:

fn writePart(writer: *std.Io.Writer, message: []const u8, prepend: ?u8, append: ?u8) !void {
    var length: usize = message.len;
    if (prepend != null) {

Finally, we do not add the last past of the message, I think we can fix that by changing the while condition to:

while (index < message.len) {

I have not yet been able to test it, so it may still be wrong.

What I don’t understand is: when start == ‘[’ and end == ‘,’ shouldn’t we need to replace the comma with a closing bracket? But I can’t read this in your original code.

Zig can slice arrays easily with array[start..end]. However, we are only looking at the JSON as a string, so we would slice on the string instead of the array that it represents.

prepend is an optional character, so it may contain a character, or contain null. We can use an if statement to check if it contains a character, and if that’s the case, capture the character in char. So |char| declares a new variable.

I don’t think that condition exists in the code. When 64 MiB of JSON Array is passed from JavaScript there’s only one (1) opening [. So, that would be the case of the first read after the length is read, and you happen to wind up on a , (44), replace comma with closing ] I think is what we do. Else we’d have to store that , and wait for the next value, then insert ] - which leads to comma being at next read, replace with [ - so we are always sending [0, 255].

Kindly share the whole code like you did in the first post so I can run it.

Just keep in mind that [255, 0] is JSON - which is a string format.

I’ve been lobbying in Chromium and Web extension world for Uint8Array option for the protocol for a while Chromium.

These C and C++ hosts that indeterminately streams real-time PCM capture from parec to the browser might help show how I constructed the JSON Arrays for an actual use case, rather than the base tests I am doing here with Zig.

void sendMessage(uint8_t *response) {
  const uint32_t responseLength = strlen(response);
  fwrite(&responseLength, sizeof responseLength, 1, stdout);
  fwrite(response, responseLength, 1, stdout);
  fflush(stdout);
}
// Exclude double quotation marks from beginning and end of string
// https://stackoverflow.com/a/67259615
char *strdelch(char *str, char ch) {
  char *current = str;
  char *tail = str;
  while (*tail) {
    if (*tail == ch) {
      tail++;
    } else {
      *current++ = *tail++;
    }
  }
  *current = 0;
  return str;
}


int main(void) {
  size_t messageLength = 0;
  uint8_t *const message = getMessage(&messageLength);
  char *command = strdelch((char *)message, '"');
  uint8_t buffer[1764]; // 441 * 4
  char *output = malloc((1764 * 4) + 3);
  FILE *pipe = popen(command, "r");
  free(message);
  while (1) {
    size_t count = fread(buffer, 1, sizeof(buffer), pipe);
    output[0] = '[';
    output[1] = 0;
    for (size_t i = 0; i < count; i++) {
      char data[5];
      sprintf(data, "%d", buffer[i]);
      strcat(output, data);
      if (i < count - 1) {
        strcat(output, ",");
      }
    }
    strcat(output, "]");
    sendMessage((uint8_t *)output);
  }
  free(output);
  return 0;
}
int main() {
  string message = getMessage();
  size_t length = 1764; // 441 * 4
  uint8_t buffer[length]; 
  string output;
  output.reserve((length * 4) + 2);
  // Exclude double quotation marks from beginning and end of string
  FILE *pipe = popen(message.substr(1, message.length() - 2).c_str(), "r");
  while (true) {
    size_t count = fread(buffer, 1, sizeof(buffer), pipe);   
    output += "[";
    for (size_t i = 0; i < count; i++) {
      output += to_string(buffer[i]);
      if (i < count - 1) {
        output += ",";
      }
    }
    output += "]";
    sendMessage(output);
    output.erase(output.begin(), output.end());
  }
}

I now have the following:

const std = @import("std");

fn getMessage(reader: *std.Io.Reader, alloc: std.mem.Allocator) ![]u8 {
    const header = try reader.takeInt(u32, .native);
    const output = try alloc.alloc(u8, header);
    errdefer alloc.free(output);
    try reader.readSliceAll(output);
    return output;
}

fn sendMessage(writer: *std.Io.Writer, message: []const u8) !void {
    if (message.len > 1024 * 1024) {
        var from_index: usize = 1024 * 1024 - 8;
        var index: usize = 0;
        
        while (index < message.len) {
            const i = std.mem.findScalarPos(u8, message, from_index, ',') orelse message.len;
            const part = message[index .. i];
            index = i;
            from_index += 1024 * 1024 - 8;
            
            const start = part[0];
            const end = part[part.len - 1];

            if (start == '[' and end != ',' and end != ']') {
                try writePart(writer, part, null, ']');
            } else if (start == ',' and end != ']') {
                try writePart(writer, part[1..], '[', ']');
            } else if (start == ',' and end == ']') {
                try writePart(writer, part[1..], '[', null);
            } else {
                try writePart(writer, part, null, null);
            }
        }
        
    } else {
        try writer.writeInt(u32, @intCast(message.len), .native);
        try writer.writeAll(message);
    }
    try writer.flush();
}

fn writePart(writer: *std.Io.Writer, message: []const u8, prepend: ?u8, append: ?u8) !void {
    var length: usize = message.len;
    if (prepend != null) {
        length += 1;
    }
    if (append != null) {
        length += 1;
    }
    
    try writer.writeInt(u32, @intCast(length), .native);
    
    if (prepend) |char| {
        try writer.writeByte(char);
    }
    try writer.writeAll(message);
    if (append) |char| {
        try writer.writeByte(char);
    }
}

pub fn main(init: std.process.Init) !void {
    const alloc = init.gpa;
    const io = init.io;
    
    const stdin = std.Io.File.stdin();
    var stdin_buffer: [4096]u8 = undefined;
    var stdin_reader = stdin.reader(io, &stdin_buffer);
    
    const stdout = std.Io.File.stdout();
    var stdout_buffer: [4096]u8 = undefined;
    var stdout_writer = stdout.writer(io, &stdout_buffer);
    
    while (true) {
        const message = try getMessage(&stdin_reader.interface, alloc);
        defer alloc.free(message);
        try sendMessage(&stdout_writer.interface, message);
    }
}
1 Like

Yep, that works. I’ll test your version against the one the online “code converter” spit out, and probably just throw your code in as a substitute. And compile to WASM. Without objection… Thanks.


The only thing I’d do a little differently, structural-wise, is inlcude writePart in sendMessage. I usually just expose 3 symbols, getMessage, sendMessage, and main. Maybe an encodeMessage to send errors to STDOUT rather than STDERR.

How would I go about placing writePart inside of sendMessage function body?

I don’t see why you would want to do that. The writePart function is not exposed, since it’s not marked as pub. Only declarations that are marked with pub are available to other files, the others are private. If you really want it, it’s possible by wrapping the function in a struct, but it’s much less readable then just using a private helper function.

1 Like

I don’t see why you would want to do that. The writePart function is not exposed, since it’s not marked as pub . Only declarations that are marked with pub are available to other files, the others are private.

The why is to keep the same code structure of getMessage, sendMessage, main pattern that I try to maintain across all hosts.

If you really want it, it’s possible by wrapping the function in a struct, but it’s much less readable then just using a private helper function.

Then it’s a bonus in this case: I lear how to do that in Zig.

You could do it like this:

const writePart = struct {
    fn f(write: *std.Io.Writer, send: []const u8, prepend: ?u8, append: ?u8) !void {
        var length: usize = send.len;
        if (prepend != null) {
            length += 1;
        }
        if (append != null) {
            length += 1;
        }
        
        try write.writeInt(u32, @intCast(length), .native);
        
        if (prepend) |char| {
            try write.writeByte(char);
        }
        try write.writeAll(send);
        if (append) |char| {
            try write.writeByte(char);
        }
    }
}.f;

if (start == '[' and end != ',' and end != ']') {
    try writePart(writer, part, null, ']');
} else if (start == ',' and end != ']') {
    try writePart(writer, part[1..], '[', ']');
} else if (start == ',' and end == ']') {
    try writePart(writer, part[1..], '[', null);
} else {
    try writePart(writer, part, null, null);
}

But you could also restructure the function so it doesn’t need the helper function:

const i = std.mem.findScalarPos(u8, message, from_index, ',') orelse message.len;
var part = message[index .. i];
index = i;
from_index += 1024 * 1024 - 8;

const start = part[0];
const end = part[part.len - 1];

var append: ?u8 = null;
var prepend: ?u8 = null;

if (start == '[' and end != ',' and end != ']') {
    append = ']';
} else if (start == ',' and end != ']') {
    part = part[1..];
    prepend = '[';
    append = ']';
} else if (start == ',' and end == ']') {
    part = part[1..];
    prepend = '[';
}

var length: usize = part.len;
if (prepend != null) {
    length += 1;
}
if (append != null) {
    length += 1;
}

try writer.writeInt(u32, @intCast(length), .native);

if (prepend) |char| {
    try writer.writeByte(char);
}
try writer.writeAll(part);
if (append) |char| {
    try writer.writeByte(char);
}
1 Like

I was looking for the 2d option. Thanks for the example of both options.

Public Zig code updated with attribution here