Nice post, Loris.
Taking the query as a comptime string is interesting. Iād explore going the other direction: build a query as a type which has a way to render into the query string. I think thatās distinct from ORM because it does not try to force OOP; it only creates types for various kinds of queries.
i.e. something like this:
pub fn Rows(comptime T: type) type {
return struct {
db_rows: db.Rows,
pub const Field = std.meta.FieldEnum(T);
pub const Row = struct {
db_row: db.Row,
pub fn int(row: *Row, field: Field) u32 {
return row.db_row.int(std.meta.fieldIndex(T, @tagName(field)));
}
pub fn text(row: *Row, field: Field) []const u8 {
return row.db_row.text(std.meta.fieldIndex(T, @tagName(field)));
}
};
pub fn next(this: *@This()) ?Row {
return this.db_rows.next();
}
pub fn deinit(this: *@This()) void {
this.db_rows.deinit();
}
};
}
const TableName = enum {messages};
pub fn selectFrom(comptime T: type, table_name: TableName) Rows(T) {
const query = "SELECT " ++ queryColumns(T) ++ " FROM " ++ @tagName(table_name);
return .{ .db_rows = db.rows(query, .{}) catch db.fatal(@src()) };
}
inline fn queryColumns(comptime T: type) []const u8 {
// TODO return field names of T joined with commas
}
test {
var rows = selectFrom(struct {
id: u32,
origin: u32,
channel: u32,
author: u32,
body: []const u8,
}, .messages);
defer rows.deinit();
while (rows.next()) |r| {
const msg: Message = .{
.id = r.int(.id),
.origin = r.int(.origin),
.channel = r.int(.channel),
.author = r.int(.author),
.text = gpa.dupe(u8, r.text(.body)) catch oom(),
};
try state.latest_messages.pushFront(gpa, msg);
log.debug("loaded chat message: {f}", .{msg});
}
}
Here I used a struct to hint that a further enhancement would be to add type safety, but I ignored the field types in this example to make it match more closely your example code from the blog post.
You could also more simply make it return instances of your struct type from next(). I think thatās the direction I would take this code.
I think my personal preference is to use the SQL syntax itself as the āinterfaceā as much as possible, but this is definitely a valid approach, and AFAIK not yet explored in Zig.
Previous art about query builders that Iām aware of:
Here are a couple of things that would complicate usage of struct definitions like in your example:
- How do you encode things like nested queries and usage of sql functions? Like this query for example:
var rows = db.conn.rows( \\SELECT \\ messages_search.author, \\ users.handle, \\ messages_search.channel, \\ channels.name, \\ highlight(messages_search, 2, char(0x1b) || ā\[7mā, char(0x1b) || ā\[27mā) \\FROM messages_search \\INNER JOIN users ON messages_search.author == users.id \\INNER JOIN channels ON messages_search.channel == channels.id \\WHERE body MATCH ?; , .{query}) catch db.fatal(@src()); defer rows.deinit(); - Defining types like you did in a struct can help pinning down the expected result type but it canāt be the authoritative source. The type of that column is actually defined by the type of the original columns involved in the operation, and then the result of running the query on those. So ultimately writing types there is essentially the same as calling sqliteās
r.int.
One last advantage from using naked SQL directly is that you can leverage existing SQL tooling, like editors that can help you with refactoring, which can be facilitated even more by keeping queries in .sql files and @embedFileing them.
I think the difference between these approaches is summed up with āwhich flavor of impedance mismatch do you prefer?ā The type-first approach is limited in what valid queries it can generate, and the SQL-first approach is limited in what valid queries it can parse. Both require writing tons of glue code to marshal data between the dbās tabular format and the applicationās in-memory data structures.
IMO the only way out is to use a database that works the same as in-memory data structures (like xitdb which I wrote) so there is no impedance mismatch at all. But if I had to stick to SQLite I would probably lean towards Lorisā approach because Iāve suffered through too many query builders and always end up needing to drop down to raw SQL at some point anyway.
My idea would be something like this as an API for the user of the library:
fn select(db: *DB, Columns: type, comptime restOfStatement: []const u8) !RowIter(Columns);
which would be used like this:
var answer = try db.select(
struct {
id: u64,
origin: u64,
channel: u64,
author: u64,
body: []const u8
},
"FROM messages",
);
defer answer.deinit();
while (answer.next()) |row| {
//...
}
Internally it could (very simply; fieldnames are the columnnames in the query) at comptime create the query string while still being be typesafe and not needing the user to repeat everything.
So internally it would do something like this (some things are omitted):
const fields = std.meta.fields(Columns);
const queryString: []const u8 = comptime queryString: {
var queryString: []const u8 = "SELECT " ++ fields[0].name;
for (fields[1..]) |f| {
queryString = queryString ++ ", " ++ f.name;
}
break :queryString queryString ++ " " ++ restOfStatement;
};
The more I think about it, the more I like the idea of just parsing naked SQL queries. The only downside I can think of is that, depending on which DB you decide to target, the syntax will have to change a little. Everything else will be pretty much the same compared to the query builder approach.
Iām not going to go down this rabbit hole any time soon, but a system that is able to analyze db queries will be able to look at schema definitions and use that information to resolve types of queries, and it will additionally be able to compute migration queries for schema upgrades.
That seems a really cool thing to have, similar to Djangoās ORM (minus the ORM part), which in the past used to be a separate project called south, for those who remember.
Iām sure I could come up with a type-safe wrapper for that style of query. But you can also bypass the abstractions and use the lower level APIs for complex queries. Like you said, this is about getting a big bang for your buck.
That also looks like a PITA to parse at comptime.
Regarding separate sql files and tooling and such, I really canāt fathom wanting to do that. Iāve made nontrivial applications before using sql databases and all my queries were CRUD.
But hey, maybe thereās some software development landscape that I have a blind spot for.
Agreed, my preferred approach would be build time parsing and codegen for wrappers.
No I think youāre right, if the schema is designed well enough you donāt do too much nesting most of the time, except when creating analytics dashboards, but even then you donāt want those to run off of your main db.
That being said, a wrapper alone doesnāt guarantee that columns have the type that you think they have, only knowledge of the db schema does.
let me throw my own simple sqlite wrapper: zigqlite: zigqlite
the last update is over a year old, so i guess it no longer compiles without some work.
i wrote it to explore how it feels to combine the things i like most about zig:
- type safety
- libraries that accept and adapt to userās structs instead of forcing its types onto the user
and yes, using naked SQL is a big part of usability IMHO. iām still trying to find how to do some query construction without having to āhideā the SQL (and without really parsing it, if possible)
Can somebody with much deeper understanding than mine connect the dots between this āidea of parsing naked SQL queriesā and the lower-level Zitron @mnemnion just posted ? I appreciate that Zitron is more ānormallyā(?) suited to provide a parser for the likes of a sqlite interface lib itself (perhaps a native-zig sqlite replacement, as an alternative to zig sqlite wrappers(?)), but could it be valuable to higher-level use? For instance, if @kristoffās use case was NOT as āsimpleā as he proposes, but was a bigger project with nested queries, etc., and he still wanted type mapping, would a zig-native Zitron-created sqlite parser be a resource? (Sorry if Iām way off base in my understanding - Iām more used to comfortable territory like python sqlite3 usage as a magic blackbox.)
EDIT: I should clarify that I am a big raw-SQL fan, and have some distaste for ORMs, so this middle-ground of specified-fields ācheckingā (āsimple ORMish flat-mappingā), but with raw SQL exposed right up at the front-end of the code, is my cup of tea. In case that leaning colors any interpretation of my interest in general use of Zitron.
Indeed, being able to parse your (complex) SQL queries opens up the possibility of extracting their semantics in a structured way, which could lend itself to all kinds of additional advantages, such as linking each column with an SQL data type, validating grouping / windowing calls, etc.
Nice article, Loris.
Some tangential ramblings related to the broader idea, and @andrewrkās remarks:
Awhile ago at this point, I made a rendering interface that heavily used comptime in its ergonomics. Hereās an example shader definition
const World = struct {
u_offset: [2]f32,
};
const PerVertex = struct {
a_position: [2]f32,
a_color: [3]u8,
pub const a_color_usage = rhi.IntegerUsage.normalized_float;
};
const variants = &[_]rhi.Variant{
.{ .name = "v_color", .type = "vec3" },
};
const Program = rhi.Program(struct { World: World }, PerVertex, null, variants,
\\void main(void) {
\\ gl_Position = vec4(a_position + u_offset, 0.0, 1.0);
\\ v_color = a_color * a_color;
\\}
,
\\out vec4 finalColor;
\\void main(void) {
\\ finalColor = vec4(sqrt(v_color), 1.0);
\\}
);
This thing was pretty neat. It does things like automatically writing definitions for the types in glsl, and prepending it to the shaders. So it was fully type safe between Zig and the shaders. All of the offsets are automatically calculated for setting up the VertexAttribPointer on the fields, for example. Then the draw commands would take arguments that match the types provided, eg:
global.program.draw(.{ .World = &global.uniform }, &global.vao);
global.uniform has type rhi.Uniform(World), and vao has type rhi.VertexArrayObject(PerVertex, null).
Despite being neat, ultimately when a future project needed similar rendering interface, I pulled way back on the complexity of what I was doing with comptime. In practice, it didnāt actually help solve real problems, and it made actually getting stuff done more frustrating. Looking at the two major claimed benefits of the above API:
- Autogeneration of glsl types from Zig types: āWrite onceā and āuniversal source of truthā sounds really nice. But how much time does it actually take to write something twice? A few seconds? In practice, the drawbacks more that lost this time.
- The type safety would catch any bugsā¦.that I would have immediately found anyways. The nastiest of bugs are those which occur under specific values in large ranges of possible program states. Getting types wrong here is very likely to cause large and immediate problems, running it once is often sufficient testing. Any half decent integration test will give sufficient confidence in the code, I feel.
On the other hand, there were significant downsides:
- The types werenāt actually defined the same way always, so more info had to be added to the Zig side. Even just the back thinking of āI want the glsl to be generated to be __, so I need to define it in as _ā can take longer than just writing it twice.
- When debugging, I constantly had to compile and run the program in order to even see the generated glsl.
- The error line numbers were always wrong, unless looking at the generated glsl.
- The API was more complicated.
- Worse, the strictness of the API meant I would take 2 steps of working on my actual problem before needing to come back to the API layer and change something. Instead of writing a few lines of OpenGl calls, I would have to spend a bunch of time thinking about the type design, making sure the API surface was right, debugging generating code, going and updating all of the old call sites, etc.
However, I should note that I didnāt remove all of comptime from my graphics programming. For example while programs and VAOs are now concrete types, buffers are still generic. I still get the benefit of types when writing the data to the buffer, and the definitions for binding the fields still gets the benefits of generating from fields and offsets, but in a much more flexible way:
media.vao.beginInit();
media.verts.vertexAttribField(.pos, false);
media.verts.vertexAttribField(.color, false);
media.vao.endInit();
So, if my previous experience is anything to go by, I have doubts on some proposals in this thread:
- Parsing SQL: Handling all of the cases as the SQL gets complicated sounds like a massive headache. The answer here seems to be ājust donāt handle themā, which I feel will be hit much more often than expected.
- Generating SQL: Much more directly has problems I talked about above, and has the same problem that once it gets complicated you end up in working on the API layer land, instead of actually working on your problem.
But actually, I think my real take away is āGo a bit crazy with it, see what works, but plan to pull back to what was actually useful.ā
Taking a few minutes to look at the SQLite API, I do have a direction I would try when writing a Zig API. I identify two priorities:
- Prepared statements look like theyāre important for performance. Also worth pointing out the parametrized queries are how sql injection attacks are avoided, so will also definitely want any queries to be declared statically at compile time.
- It should be super simple to actually write SQL. I donāt want to hit inflection points in my usage where having a join or subquery or something suddenly means that I have to pull back to a different API. Real world queries tend to be a lot more complicated
SELECT id, origin, channel, author, body FROM messages.
From those priorities, where Iād start is an API that looks like this:
Queries.zig
messages: Query(
"SELECT m.id, m.origin, u.name, m.body FROM messages m INNER JOIN users u on m.author = u.id WHERE m.channel = ? ORDER BY m.date ASC LIMIT 100",
struct {
channel: u64,
},
struct {
id: u64,
origin: u64,
author: []const u8,
body: []const u8,
}),
main.zig
pub fn main() !void {
var server: SQLiteServer(@import("Queries.zig")) = try .connectAndPrepare(...);
// app code
...
}
fn loadMessages(allocator: Allocator, channel: u64) !void {
var row_iterator = try server.begin(.messages, .{.channel = channel});
defer row_iterator.release();
while (try row_iterator.next(allocator)) |row| {
defer allocator.free(row.author); // needed?
defer allocator.free(row.body); // needed?
//etc
}
}
Basically, a connection to the server starts by preparing all of the queries the app will ever use. Thereās some notes about how multithreading and such work, so not taking the time to dig into them i donāt know the best way to handle them. Probably something like having the prepared statements be thread local. I would use debug.SafetyLock on prepared statements to check against double use.
Iām a bit unsure about how to handle memory for byte blobs. Again would have to dig into the details. If byte blobs have a lot of options that code will want to choose between, Iād probably switch out the struct types in the query definition for enums or something, then have getter methods closer to the OP for the different usages. Edit: I would probably start with a iterator.next() and iterator.nextDupe(allocator) methods, where dupe causes things to be owned, and the simple nextās values become invalid once next is called again. If you have a mix of needs, then call the dupe on the fields you need to yourself. In that case you shouldnāt be using the structs which have dangling pointers anyways.
Writing a Zitron SQLite parser would be simpler than usual, due to its heritage. Zitron includes the actual SQLite parser as sample code for validating lemon.zigās behavior, someone could take this (public domain) code and replace all the C in it with Zig-specific actions. Everything but the C should be readily understood by Zitron, and the parser tables should be identical, rather than equivalent.
What @kristoff is doing is a case of what we in langsec call āshotgun parsingā. In his case thereās nothing wrong with that at all, for two reasons: itās happening during compilation, and (as a consequence) an āattacker-controlledā string canāt get anywhere near the logic. Loris is careful in several places to note the limitations and specific circumstance in which he does this, I think itās clever, 10/10 no notes.
Shotgun parsing more generally has a bad reputation, especially when it gets anywhere near a string from āthe wildā, and parsing in general has a way of starting in one specific place and growing until wild strings can reach the parser, often in surprising ways.
For the ādoing cool stuff with Zig, SQLite, and Zitronā concept, thereās certainly potential there. At issue is that a lot of the ācool stuffā involves going from Zig code to SQL strings, which is something a parser canāt really help with. It could verify the syntax, but SQLite will do that for you.
For anything which starts with an SQL string, though, yeah, could do some good stuff there. That can include validating the structure of an arbitrarily-complex query against expected Zig types, although SQLite is dynamically typed so that has to be borne in mind with regard to anything returned.
Slightly off topic:
The SQL-string versus code reminds me of my job (which I quit btw) using C#.
I made my own metadata āpersistenceā tool extracting stuff from the database and only used sql-strings if my tool was not advanced enough to extract the needed data. The tool was written mainly for database conversions: read-convert-write.
I also had a brief look at the EntityFramework shipping with C# and even that could very easily crash.
Up until now I have never encountered something that really can ātranslateā code to SQL and back in a logic, consistent, fast, crashless way. As programmers I think we do not like to write SQL statements ![]()
Besides that: it is a very interesting area.
I never had Entity Framework crash on me except in the sense that itās possible to write queries that either cannot be translated to SQL at all or perhaps just that the engine didnāt support it, in which case it would throw an exception needless to say thatās an annoying way to develop, especially if an updated query was faulty and made it through QA. You can catch the exception but the code still wonāt work.