After spending some time reading through the x86-64 backend’s code, It has occured to me that inline asm statements could accept an aggregate value as their template. Here is a rough draft of what the syntax could look like with a wrapper around cpuid on x86-64:
pub fn cpuid(leaf: u32, sub_leaf: u32) struct {u32, u32, u32, u32} {
return asm (.{ .x86_64 = .{
.arguments = &.{ .eax, .ecx },
.result = &.{ .eax, .ebx, .ecx, .edx },
.instructions = &.{.cpuid},
} }, leaf, sub_leaf);
}
The first argument is the comptime template, which would be of a new type std.lang.Assembly. The remaining arguments correspond to the arguments field. The result value is either a single value when there is one result, or a tuple when there are multiple.
A more advanced example which calculates triangular numbers could look like:
pub fn triangular(n: usize) usize {
return asm (.{ .x86_64 = .{
.arguments = &.{.{ .register = .count }},
.result = &.{.{ .register = .sum }},
.clobbers = &.{ .{ .register = .count }, .eflags },
.instructions = &.{
.{ .xor = .{ .dst = .{ .register = .sum }, .src = .{ .register = .sum } } },
.{ .label = .loop },
.{ .add = .{ .dst = .{ .register = .sum }, .src = .{ .register = .count } } },
.{ .dec = .{ .register = .count } },
.{ .jnz = .{ .label = .loop } },
},
} }, n);
}
Here the registers for the argument and result are allocated by the compiler and referenced via @EnumLiteral()s. The jump statement also uses an @EnumLiteral() to reference a label created by the label psuedo-instruction. I am not sure that this is how std.lang.Assembly would actually look, for example labels could be replaced with indexes into the instructions slice, but this is just an example of what is possible.
I see a few advantages with this approach over the accepted syntax in #10761:
- It avoids abstracting one syntax across architectural differences
- The
std.lang.Assemblytype is self-documenting - Keeps the language simpler
It could also be incrementally migrated to by adding a legacy union field which accepts the old text-based syntax and could be automatically converted to by zig fmt.
Here is what std.lang.Assembly would look like for the above to functions to work:
pub const Assembly = union(enum) {
x86_64: X86_64,
pub const X86_64 = struct {
arguments: []const Operand = &.{},
result: []const Operand = &.{},
clobbers: []const Operand = &.{},
instructions: []const Instruction,
pub const Operand = union(enum) {
eax: void,
ebx: void,
ecx: void,
edx: void,
eflags: void,
register: @EnumLiteral(),
label: @EnumLiteral(),
};
pub const Instruction = union(enum) {
label: @EnumLiteral(),
add: Binary,
xor: Binary,
dec: Operand,
jnz: Operand,
cpuid: void,
pub const Binary = struct {
dst: Operand,
src: Operand,
};
};
};
};