Reading from sqlite db in `while` loop causes wrong string output, but works if using `if`

Hi, I am learning Zig and have encountered weird behavior (see example below). I am pretty sure I’m misunderstanding something for this error to occur, but I can’t figure out what exactly :pensive_face: Can somebody please review my code? I’ve tried to simplify it as much as possible and have a minimal reproducible example.

What I also know is, if I replace the while below with just an if it works; it also works if I add break; at the end of the loop :face_with_spiral_eyes:

My Environment

Zig: 0.15.1
OS: MacOS Sonoma 14.7.2

Steps to reproduce

  1. zig init
  2. Install zig-sqlite
    $ zig fetch --save=sqlite https://github.com/nDimensional/zig-sqlite/archive/refs/tags/v0.3.2-3500400.tar.gz```
    
  3. Update build.zig
    fn build(b: *std.Build) void {
        const exe = b.addExecutable(.{ ... });
        // ...
    
        exe.root_module.addImport("sqlite", b.dependency("sqlite", .{}).module("sqlite"));
    }
    
  4. Replace src/main.zig
    const std = @import("std");
    
    // third-party
    const sqlite = @import("sqlite");
    
    const Item = struct { body: sqlite.Text };
    
    pub fn main() !void {
        // start setup
        const db = try sqlite.Database.open(.{ .path = "db.sqlite" });
        try db.exec("DROP TABLE IF EXISTS t;", .{});
        try db.exec("CREATE TABLE IF NOT EXISTS t (id INTEGER PRIMARY KEY, body TEXT);", .{});
        try db.exec("INSERT INTO t (body) VALUES ('test me');", .{});
        defer db.close();
        // end setup
    
        // start problem code
        const select = try db.prepare(struct {}, Item, "SELECT * FROM t;");
        defer select.finalize();
        try select.bind(.{});
        defer select.reset();
    
        var read_item: Item = undefined;
        while (try select.step()) |item| {
            read_item = item;
        }
        try std.testing.expectEqualStrings("test me", read_item.body.data);
        // end problem code
    }
    
  5. zig build run

Expected output

# no output

Real output

$ zig build run

====== expected this output: =========
test me␃

======== instead found this: =========
0o(␃

======================================
First difference occurs on line 1:
expected:
test me
^ ('\x74')
found:
0o(
^ ('\x30')
error: TestExpectedEqual
/opt/homebrew/Cellar/zig/0.15.1/lib/zig/std/testing.zig:673:9: 0x10537bf1b in expectEqualStrings (t)
        return error.TestExpectedEqual;
        ^
/tmp/t/src/main.zig:28:5: 0x10537cd87 in main (t)
    try std.testing.expectEqualStrings("test me", read_item.body.data);
    ^
run
└─ run exe t failure
error: the following command exited with error code 1:
/tmp/t/zig-out/bin/t

Build Summary: 3/5 steps succeeded; 1 failed
run transitive failure
└─ run exe t failure

error: the following build command failed with exit code 1:

Decided to create an issue in zig-sqlite in case this is a bug

I believe that the memory for a given step is managed by sqlite and is invalidated on the next call to step or reset. So item is only valid within the while loop body, if you want to keep any of the associated data around it will have to be copied into memory that you manage yourself.

It works with an if statement because there is no second call to step.

5 Likes

You may also want to use one of the easier functions like row for a single row.

2 Likes

So item is only valid within the while loop body, if you want to keep any of the associated data around it will have to be copied into memory that you manage yourself.

Thanks! Can you please give an example how I can do that? I thought by assigning it to read_item I am actually copying it.

Sadly this library does not provide row

Sorry, it’s called one, I mixed it up with another library. Example is in there.

2 Likes

Apart from the memory issues (please read the docs, they explicitly say this), there are some other issues with your code, at the SQL level:

You should remove the semicolon at the end of your SQL statement. Probably sqllite kindly ignores it, other DBs/drivers could raise an error (eg Oracle JDBC).
The semicolon is not really part of an SQL statement, instead it’s used in scripts to separate statements from each other.
It’s a common mistake.

And your table has two columns, which you select with *, but your struct only has one column.
It seems like this lib automatically ignores unmatched columns, but you shouldn’t rely on that.

Last but not least, one should avoid select * except while exploring the DB interactively.
The reasoning behind this is that tables tend to get additional columns over time.

In a code review for production-level code, I would consider these three issues as programming errors.

2 Likes

The sqlite.Text struct just contains one string. That’s a slice which in turn contains a pointer. When you copy the item like you do, you copy that pointer. The lib then pulls the data behind the pointer out under your feet. You need to copy the content of the slice with an allocator or just a buffer if you want to keep it. I think, the allocator interface has a simple dupe which is probably the easiest way to do that.

2 Likes

Oh) You are linking to a different library. I am using this one: GitHub - nDimensional/zig-sqlite: Simple, low-level, explicitly-typed SQLite bindings for Zig.. Thanks anyway)

Thank you, everyone, for the pointers! I also got a reply from maintainers

The problem is that the data of each step() result is only valid until the next call to step(). What’s happening in your code is that when you copy the result with read_item = item, this is only a shallow copy of the Item struct, and SQLite only guarantees you that the pointer item.body.data.ptr: [*]const u8 points to a real value until the next call to step(). The while loop calls step() until it returns null, after which SQLite is doing some internal cleanup and the text value is no longer located at that pointer address. It doesn’t cause a problem with if because there’s no subsequent call to step().

Anyway, the right thing to do is to copy the entire item.body.data slice, either using an allocator to alloc a new slice or copying it into fixed buffer. Depends on what you ultimately need to do with the data.

As @hvbargen mentioned, I missed this line in README.md

Text and blob values must not be retained across steps. You are responsible for copying them.

Here is a fixed code

const std = @import("std");

// third-party
const sqlite = @import("sqlite");

const Item = struct { body: sqlite.Text, id: u32 };

pub fn main() !void {
    // start setup
    const db = try sqlite.Database.open(.{ .path = "db.sqlite" });
    try db.exec("DROP TABLE IF EXISTS t", .{});
    try db.exec("CREATE TABLE IF NOT EXISTS t (id INTEGER PRIMARY KEY, body TEXT)", .{});
    try db.exec("INSERT INTO t (body) VALUES ('test me')", .{});
    defer db.close();
    // end setup

    // start problem code
    var allocator = std.heap.DebugAllocator(std.heap.DebugAllocatorConfig{}){};
    defer std.testing.expect(allocator.deinit() == .ok) catch @panic("leak");
    const gpa = allocator.allocator();
    // const gpa = std.heap.page_allocator;

    const select = try db.prepare(struct {}, Item, "SELECT id, body FROM t");
    defer select.finalize();
    try select.bind(.{});
    defer select.reset();

    var read_item: []u8 = undefined;
    while (try select.step()) |item| {
        read_item = try gpa.dupe(u8, item.body.data);
    }
    // in the real code ArenaAllocator is used, so not freeing all instances of `read_item` here is ok
    // the goal of the example was to fix shallow copy issue
    defer gpa.free(read_item);

    // end problem code
    try std.testing.expectEqualStrings("test me", read_item);
}
1 Like

You might care that you only free the last item from your while loop, but are duping every one. So if there are 100 items, you will currently leave 99 dupes allocated.

2 Likes

If you call gpa.deinit(), it will hopefully tell you about that anyway.

Yes, thanks! DebugAllocator caught that for me) I decided to use ArenaAllocator in this case because in the actual code the container for read_items is used outside of the function it gets initiated.