To get more practice with Zig 0.15.1 I have written a little parser for standalone XML. It uses the new Io.Reader with its buffer, and also some “programming without pointers”.
This is a neat project, kudos.
Some feedback: it’s not totally clear why the Parser has an allocator and also takes an allocator. I can guess: the Parser uses the initialized allocator for internal allocations, and the passed allocator for anything returned to the user. But it’s good to document these things.
Also, it looks as though the Parser only frees memory during deinit. If so, it might make sense to use an Arena internally and just free that. My sense is that one is supposed to deinit the parser after each XML parse? This would also be a good thing to document.
In general, you’re making life easier for users if everything marked pub has a doc comment. “Users” includes you, if you ever come back to the code after a few months of not looking at it, which is likely.
Last bit: it seems like this and this are doing basically the same thing, so you could factor that out into an inner parse function and replace the if trees with a switch. Tough call, because I see that they’re doing similar but not identical things, but you could create a RawParse union and get a clean separation between parsing, and packaging it up into an XmlItem. Just a thought.
All in all, look like the start of a solid little SAX-style XML parser!
Thanks a lot for your feedback. Yes, documentation is nowhere up to now, and you encourage me to really do it.
Exactly.
Yes, deinit should be called after parsing is finished. My idea was to not force the user to use an arena to create the Parser. But yes, I could still use an arena internally instead of the complicated deinit. But the internal allocator dupes and frees during many of the next() calls, and I think I have to be careful about the order of the free’s because the arena can free only the most recent allocation.
Thanks for the suggestion. This is a good idea. I will try to do that. Maybe it’s possible.
Don’t know if this is SAX-style. I had the impression that SAX generates
“XML events” which are passed to a callback function provided by the user. I think of my parser more like StAX-style. That’s why I used the name XmlItem instead of XmlEvent for what Parser.next() returns. But I am not a professional programmer, so maybe StAX is just a sub-style of SAX?
You’re thinking about it in the right way.
I tend to think that, especially with textual processing, when we can trade some space for time, it’s a good tradeoff. Using an arena (an internal one) means that resets are fast the first time, and then basically instant after that. It does mean that the parser will retain the maximum amount of memory which it uses, unless you offer some interface for actually freeing the arena rather than just resetting it.
More than that, every reuse of the parser on a document which is approximately the size of the original one or smaller, will do no actual allocation at all, this can be a big deal in terms of speed.
If you refactor with a goal to reuse memory where you can, so that frees become less important, I suspect you could get pretty far with that.
When to optimize, and by how much, is always an open question though.
I had never heard of a StAX parser, so TIL. I always favor pull-based streaming interfaces, and as I think about it, the XML libraries I’ve used have been pull-based as well, not actually evented.
Anyway, neat project, I look forward to seeing where it goes!
Thanks, reusing the parser for several documents was not yet on my mind.
thanks for sharing your project! i tried it out and worked for my simple task. i am new to zig and have been messing around with it a bit when i can so please forgive me if my question is not idiomatic zig. i was curious if you were going to make a shared library version so i can dynamically link (vs statically link) to xmlread? in my project, i am dynamically linking to other libraries which i created as i need to keep my binary size small. mahalo!
Hi @zangi. Welcome to Ziggit.
A shared library is not not planned. I never tried to do that up to now and I don’t know exactly what changes would be neccessary. For example, would some struct and functions have to be marked with export or extern?
As far as I know the idiomatic way is to import the source module directly and not create a static or dynamic library, except to interface to C/C++. But I am not a Zig expert, so maybe there are reasons.
Version 0.1.0
More XML encodings: UTF-16 and ISO-8859-* (besides UTF-8)
xmlread updated to Zig 0.16.0.