My cs background is somewhat limited (and about 4 years behind me now), and so I’ve been trying to get a better grasp on things with Zig and some hobby projects. One such project involves calls to MusicBrainz, and I had some ideas about what I might do for that I eventually implemented, and after some time I got around to looking at GrooveBasin to see how Andrew did it. Beyond much nicer looking code, he had a comment about using streaming JSON that I didn’t quite understand, but eventually decided he must have meant using Readers and Writers. My memory of such concepts were about as strong as a breeze against a brick wall, so I watched some of the recent talks about how the new Io interface works and I think I pieced it together, but I have some questions about what I eventually produced versus the original GrooveBasin code.
The original (note that according to git blame this code appears to have been written just prior to the release of 0.12):
pub fn lookup(
arena: Allocator,
http_client: *std.http.Client,
recording_id: []const u8,
) !Response {
var server_header_buffer: [16 * 1024]u8 = undefined;
// musicbrainz can return a lot of data; this should switch to use the json
// streaming API.
const json_read_buffer = try arena.alloc(u8, 2 * 1024 * 1024);
var req = try http_client.open(.GET, .{
.scheme = "https",
.host = .{ .percent_encoded = "musicbrainz.org" },
.path = .{ .percent_encoded = try std.fmt.allocPrint(arena, "/ws/2/recording/{s}", .{recording_id}) },
.query = .{ .percent_encoded = "inc=work-rels+artist-credits+releases+discids" },
}, .{
.server_header_buffer = &server_header_buffer,
.headers = .{
.user_agent = .{ .override = player.http_user_agent },
},
.extra_headers = &.{
.{ .name = "accept", .value = "application/json" },
},
});
defer req.deinit();
try req.send();
try req.wait();
if (req.response.status != .ok)
return error.HttpRequestFailed;
const content_type = req.response.content_type orelse
return error.HttpResponseMissingContentType;
const mime_type_end = std.mem.indexOf(u8, content_type, ";") orelse content_type.len;
const mime_type = content_type[0..mime_type_end];
if (!std.ascii.eqlIgnoreCase(mime_type, "application/json"))
return error.HttpResponseNotJson;
const raw_json = json_read_buffer[0..(try req.readAll(json_read_buffer))];
const response = try std.json.parseFromSliceLeaky(Response, arena, raw_json, .{
.ignore_unknown_fields = true,
});
return response;
}
My own:
pub fn lookup(arena: std.mem.Allocator, http_client: *std.http.Client, recording_id: []const u8) !Response {
// Size chosen based on GrooveBasin
var json_read_buffer: [2 * 1044 * 1024]u8 = undefined;
var writer = std.Io.Writer.fixed(&json_read_buffer);
const res = try http_client.fetch(.{
.method = .GET,
.location = .{
.uri = .{
.scheme = "https",
.host = .{ .percent_encoded = "musicbrainz.org" },
.path = .{ .percent_encoded = try std.fmt.allocPrint(arena, "/ws/2/recording/{s}", .{recording_id}) },
.query = .{ .percent_encoded = "inc=work-rels+artist-credits+releases+discids" },
},
},
.headers = .{ .user_agent = .{ .override = "my_super_awesome_uger_agent" } },
.extra_headers = &.{.{ .name = "accept", .value = "application/json" }},
.response_writer = &writer,
});
if (res.status != .ok) return error.HttpRequestFailed;
if (!(try std.json.validate(arena, writer.buffered()))) return error.MalformedJson;
var reader = std.Io.Reader.fixed(writer.buffered());
var json_reader = std.json.Scanner.Reader.init(arena, &reader);
const response = try std.json.parseFromTokenSourceLeaky(Response, arena, &json_reader, .{ .ignore_unknown_fields = true });
//const response = try std.json.parseFromSliceLeaky(Response, arena, writer.buffered(), .{ .ignore_unknown_fields = true });
try writer.flush();
return response;
}
- Is setting up the
json_readernecessary? Without a Reader, I would have read straight from the Writer’s buffer (shown in the last comment). Does having the Reader improve performance by moving only smaller pieces of data, rather than (potentially) much larger loads of memory, or does it maybe not matter? - If I should have the Reader (or in future situations where it is warranted) does it make sense for me to make it’s buffer
writer.buffered()or is there something dangerous there? Somehow “reaching into” the object makes me uncomfortable (I understand that probably sounds silly) but it’s what made my initial test pass, so I figured I must have been doing something right. - Perhaps this is more specifically a question for Andrew (and my intuition is that this is sorta scrupulous anyways), but I’m wondering my error checking for JSON leaves something to be desired, whereas before one would check via
Content-Type, and seeing as though you can’t check that when usingstd.http.Client.fetch(), I chose to usestd.json.validate(). The goal of checking that it’s (valid) JSON is still achieved, but it seems that he had access tovalidate()even when he wrote the code, so it makes me question if it’s the right choice.
Those were my main points of curiosity, but if anything else stick out, I’d be happy to hear something. I’m having a blast learning more about Zig/programming in general!