Build your own text editor in Zig (the mdbook)

When I posted about Port of kilo editor to zig, I said I wanted to write a guide similar to the original Build Your Own Text Editor.

Now I finished it and it’s here. Comments welcome.

Repo with source code

The code branch has the source code, divided by chapters, so you can also go through it without typing the code yourself, in the readme of the repo there are tips on how you can do that.

If you read it and find mistakes, oversights or bad/wrong explanations of language concepts/features, please tell me, I don’t want to spread bad habits/notions.

There are especially a couple ‘digressions’ about the language that would need to be checked for their validity, for example:

panic
comptime
default initializers for structs, undefined
assignments
casting numbers
ArrayList strategies
inline

Thanks.

31 Likes

this look like something fun to read during my weekends :smiling_face_with_sunglasses:

Just finished ziglings and this is exactly what I was looking for next. Thanks for this!

Kept forgetting to check this lol, only checking what you asked as its quite long

panic handler

When the program encounters an error at runtime, > depending on the kind of error, two things may happen:

  • the program crashes (best case)
  • the program keeps running, but its state is corrupted (worst case)

In the second case really nasty things can happen, so we want to avoid bugs at all costs. In safe release modes (Debug and ReleaseSafe), events that would normally cause a crash or undefined behavior cause panic instead. The program terminates and you get a meaningful stack trace of what has caused the error.

Close enough, but a more accurate description:

A panic is a termination of the program indicating a fatal error.
Panics can be triggerd by explicitly calling `@panic(msg)`,
`std.debug.panic(fmt, args)` (for a formatted message)
or any of the functions in `std.builtin.panic`
which is the actual panic handler that you can override.

Panics are also triggered by safety checks for illegal behaviour
(enabled in `Debug` and `ReleaseSafe` modes).
Not all illegal behaviour is or can be checked,
and there isnt an exhaustive list **YET**.
However much of this behaviour is defined, though it shouldnt be relied upon.
Undefined behaviour is a subset of illegal behaviour.

You may have noticed that strange return type: noreturn. It means the function doesn’t simply return anything, like a void would do, it doesn’t return at all. This is so because when this function is called, our program has crashed already, and it couldn’t return any value anyway. You shouldn’t worry about it because it’s the first and last time we’ll see it in our program.

The program is not already crashed, it is called explicitly by the programmer, from a safety check or from a segfault handler that’s setup by default.

The return type is noreturn because it’s not supposed to return, as panics are supposed to indicate a fatal error.
The default panic handler terminates the program itself, the other option is an infinite loop often used for embedded.

comptime

the relative code will be removed and will not be executed at runtime.

I think you meant ‘related’ not ‘relative’.
I know what you meant, and that is correct, but what you said is not.

A correct explanation would be:

‘The condition is removed, the chosen branch stays, the other branch (the rest of the function in this case) is removed’

Pedantic but #ifdef doesn’t have the same effect, but does have the same practical use.

But sometimes the compiler says:

error: unable to resolve comptime value

In these cases the comptime keyword might fix the issue.

That error means it cannot be evaluated at comptime, so comptime doesn’t magically fix it.

I think I know what you meant, but it’s not something you can describe concisely without an example.

It’s worth stating that functions are not evaluated at comptime unless all parameters required to be comptime know, the return type requires being comptime known, or you use the comptime keyword to force it.

default field values

‘initializers’ is the wrong term, that implies some logic is possible which is false, they can only be comptime know values.

A type having functions does not make it ‘complex’,
and a ‘complex’ type is irrelevant to default values.
I understand your logic, but it’s simply wrong.

The bar for default field values is if the validity of the field’s value depends on the value of other fields
Or the validity of other fields value depends on that field.

For example

const Range = struct {
  // these fields depend on each other for validity
  // so they shouldnt have default values
  min: u8,
  max: u8,
  // this field does not depend on any other field to be valid
  // and no other field depends on it
  // so it can have a default value
  inclusive: bool = false,  

Therefore you should have a really good reason to set an undefined default value inside structs.

My opinion is there is no valid reason to have undefined as a default value. My logic is in order to be sure the instanciator of the type overrides the value before use more likely, they should see undefined in their own code. When it is a default value, they are far more likely to overlook it.

Pedantic, but undefined doesn’t have to be used in an upper scope.

assignments (reader/writer interface foot guns too)

The program will panic at runtime (in safe builds!), and the error reported can be hard to understand. Unfortunately Zig documentation is still immature, so right now you’ll have to find out the hard way how these things work.

This is incorrect, there are no safety checks for ptr type casts (@fieldParentPtr counts). But there will be in the future

What happens currently is undefined behaviour even in safe modes, you are lucky if it crashes.

casting numbers

In this program, we don’t do any casting, but we don’t have to deal with floating point numbers either.

You don’t need to use floating point numbers to do safe subtraction, in fact floating point maths in general is less safe due to precision resulting in slightly wrong values as well as inf and nan which themselves are an over simplification of invalid states.

A trick you didn’t mention is to use a slightly larger integer type, e.g. you have two u8 operands, use i9 for maths and cast the result back. The operands will be able to coerce as i9 can represent all u8 values, and all values that may occur from maths, positive and negative, (since the input values are bound to u8).

ArrayList strategies

Only criticism is to be a little less concise.

inline

Other uses of inline are very different, because they usually allow loops to be evaluated at compile time. I’ve never used them, since I never felt the need for them, so I can’t tell you more.

Correct, even worded well, but I feel like any concise statement of this feature is easily mis understood, so here is a less concise statement:

inline on loops evaluates only the loop iterations, not the body, at comptime. That does however mean the capture(s) |n,...| are comptime known to the body, which allows further comptime shenanigans and optimisations.

Again, I only checked what you asked, so most of the guide is unchecked still. Perhaps I will follow up later if I feel like it.

1 Like

Thank you very much, I’ll update the document as soon as I have time (next few days).

One example is at More comptime - Build your own text editor in Zig

Maybe it is because that function doesn’t have comptime parameters, but what I observed is that error message, and the fact that using comptime at call site fixed it, that’s why I wrote that it might fix the error, if you think it could be evaluated at compile time. Other times comptime just can’t happen, keyword or not.

Ok, it doesn’t ‘magically’ fix it, I’ll have to describe a bit better what happens (if I understood well, it’s the concatenation operator that requires the value to be comptime and that function isn’t comptime normally, so it needs comptime at call site).

that’s along the lines of what I thought you meant.

that is correct.