Lately I’ve been thinking about a form of function purity which I’d like to call semi-pure functions. It revolves around a function where the first time calling it won’t give a guarantee of a certain output, but calling it again with the same parameters will guarantee the same output as the first time.
To give an example, imagine a web server whose handlers are semipure. Upon calling an endpoint, the web server will give some additional parameter, maybe a context parameter or a special allocator, which will be used to guarantee semipurity. Let’s call this the semipure parameter.
When the handler makes an SQL call, the results would be saved in the semipure parameter. Querying the same rows in the same table again will make a new SQL call, adding the second version of these rows to the semipure parameter.
If the handler were to be executed again using the same semipure parameter, it will not make new SQL calls. Instead, it will load the data from the semipure parameter. This guarantees that the subsequent call is not dependent on the state of the database, meaning the handler will execute the same code again.
The main benefit comes into play when you can serialize and deserialize this semipure parameter. If the web server detects that the handler encountered an error, it could serialize the semipure parameter and save it to disk. Then, in the future, you could load the semipure parameter from disk, restore the relevant database state to replay what has happened, and modify the handler to fix the bug. You could even create an automatically growing test suite that covers every single failed request your web server has ever encountered.
This method does rely on knowing the state of every queried row in the database. For simple queries this is fine, but if you introduce subqueries that do not return every touched column and row, you lose this information.
I think zig makes this endeavor somewhat possible through comptime shenanigans. If one were to create an ORM that allows creating new types that are built up from exisitng database types, you could encode partial queries and queries over multiple tables inside the ORM. To give a crude example:
const User = struct {
id: ID,
firstName: []const u8,
lastName: []const u8,
balance: i64,
pub const ID = enum(u32) {_};
};
const Address = struct{
id: ID,
userId: User.ID,
city: []const u8,
street: []const u8,
pub const ID = enum(u32) {_};
};
const UserDB = DBType.fromType(User);
const AddressDB = DBType.fromType(Address);
const UserAddressDB = UserDB.join(AddressDB, User.ID).select(.{
.userId = UserDB.fields.id,
.firstName = UserDB.fields.firstName,
.lastName = UserDB.fields.lastName,
.city = AddressDB.fields.city,
});
const UserAddress = UserAddressDB.toType();
// Imagine this handler is supposed to do something more complex than simply retrieve some data
pub fn getUserAddress(semipure: Semipure, sqlClient: SqlClient, request: Request) !Response {
const userAddress = try sqlClient.getOne(semipure, UserAddressDB, .{
UserDB.equals(UserDB.fields.id, request.userId),
});
if (userAddress.userId == 3) {
userAddress.city = "Atlantis";
try sqlClient.set(semipure, userAddress);
}
return Response.encodeJson(userAddress);
}
Calling getUserAddress with userId = 1 might log the following data in the semipure:
| Query 1 | GET | ||
|---|---|---|---|
| User | |||
| id | firstName | lastName | balance |
| 1 | Bob | Smith | - |
| Address | |||
| id | userId | city | street |
| 1 | 1 | Paris | - |
Calling getUserAddress with userId = 3 might log the following data in the semipure:
| Query 1 | GET | ||
|---|---|---|---|
| User | |||
| id | firstName | lastName | balance |
| 3 | Mark | Jones | - |
| Address | |||
| id | userId | city | street |
| 3 | 3 | London | - |
| Query 2 | UPDATE | ||
|---|---|---|---|
| User | |||
| id | firstName | lastName | balance |
| 3 | Mark | Jones | - |
| Address | |||
| id | userId | city | street |
| 3 | 3 | Atlantis | - |
The address id was not explicitly included in UserAddressDB, but the system might have figured out that this is the primary index of the address table, and therefore included it anyways. I’m not entirely certain how this would internally work. Maybe a comptime error would get emitted if these fields are not included in the type?
Do you guys have any ideas/feedback on such a system?