How to gather partial strings in JSON

I’m trying to figure out how to gather partial strings with allocNextIntoArrayList. The problem is that I keep getting nulls.

The string that I’m parsing is Habitat \u0026 Description
I think that is a .partial, .partial_escaped, & .string

Paring down the code to the essential bits:

            var scanner = std.json.Scanner.initCompleteInput(self.allocator, annos);
            defer scanner.deinit();
            while (true) {
                switch (try scanner.peekNextTokenType()) {
                    .end_of_document => {
                        std.debug.print("EOD\n", .{});
                        break;
                    },
                   <<<snip>>>
                    .string => switch (try scanner.next()) {
                        .string => |value| {
                            std.debug.print("string {s}\n", .{value});
                        },
                        .partial_string,
                        .partial_string_escaped_1,
                        .partial_string_escaped_2,
                        .partial_string_escaped_3,
                        .partial_string_escaped_4,
                        => {
                            var buff = std.ArrayListAligned(u8, null).init(self.allocator);
                            defer buff.deinit();
                            const str = (try scanner.allocNextIntoArrayList(&buff, .alloc_if_needed)) orelse "";
                            std.debug.print("PARTIAL STRING '{s}'\n", .{str});
                        },
                        else => unreachable,
                    },
                   <<<snip>>>
                }
            }

Hello @telesphore
Welcome to ziggit :slight_smile:

allocNextIntoArrayListMax documentation mentions:

The next token type must be either .number or .string.

But you are calling scanner.next() after the .string.

Thank you for your reply.

I think, but I am likely wrong, that I’m calling next on the look-ahead value to get what’s in the look-ahead. Maybe I don’t understand what peekNextTokenType does then.

Let me include more of the code to show what I’m trying to accomplish… well it’s in exploration mode for now.

            while (true) {
                switch (try scanner.peekNextTokenType()) {
                    .end_of_document => {
                        std.debug.print("EOD\n", .{});
                        break;
                    },
                    .array_begin => {
                        _ = try scanner.next();
                        std.debug.print("array begin\n", .{});
                    },
                    .array_end => {
                        _ = try scanner.next();
                        std.debug.print("array end\n", .{});
                    },
                    .object_begin => {
                        _ = try scanner.next();
                        std.debug.print("object begin\n", .{});
                    },
                    .object_end => {
                        _ = try scanner.next();
                        std.debug.print("object end\n", .{});
                    },
                    .number => {
                        _ = try scanner.next();
                        std.debug.print("number\n", .{});
                    },
                    .string => switch (try scanner.next()) {
                        .string => |value| {
                            std.debug.print("string {s}\n", .{value});
                        },
                        .partial_string,
                        .partial_string_escaped_1,
                        .partial_string_escaped_2,
                        .partial_string_escaped_3,
                        .partial_string_escaped_4,
                        => {
                            var buff = std.ArrayListAligned(u8, null).init(self.allocator);
                            defer buff.deinit();
                            const str = (try scanner.allocNextIntoArrayList(&buff, .alloc_if_needed)) orelse "";
                            std.debug.print("PARTIAL STRING '{s}'\n", .{str});
                        },
                        else => unreachable,
                    },
                    .true => {
                        _ = try scanner.next();
                        std.debug.print("true\n", .{});
                    },
                    .false => {
                        _ = try scanner.next();
                        std.debug.print("false\n", .{});
                    },
                    .null => {
                        _ = try scanner.next();
                        std.debug.print("null\n", .{});
                    },
                    // else => {
                    //     std.debug.print("error\n", .{});
                    //     break;
                    // },
                }
            }

So the issue was that I was making things waaay more complicated (and buggy) than they needed to be. The way to extract the strings is more similar to:

  .string => {
      const token = try scanner.nextAlloc(self.allocator, .alloc_always);
      std.debug.print("string '{s}'\n", .{token.allocated_string});
  },

The source code is quite readable, which helped greatly.

1 Like