Gaddag optimization

A little related tail to this… I am using the “gaddag” structure in my program, which takes around 100 MB in memory.

The makers of the macondo scrabble engine say that their gaddag fits in around 5 MB. Some words about the structure are here:

I cannot follow and or believe if you add nodes in “gaddag” way this fits in 5 MB for lets say 250.000 words.
(The word CARE would be entered here as ERAC, RAC@E, AC@RE, C@ARE)
Does anyone understand if it is true and how?

I only had a rough look at the text, but it seems very highly optimized.

Different from other implementations of similar structures, it is allowed to share the tail end of a node.

This makes me think that it may save on a lot of nodes through structural sharing.

It also seems to me that the “Kurnia Word Graph” may be a different graph and not really be a gaddag, but that is just a hunch, finding that out would require reverse engineering the code and I don’t really want to stare at rust code.

I think you would have to find out how many words where actually in the word list, the word list may have been a lot smaller.

Or use your own word list to build it and see if it is way bigger with that.

I know these source lists are around 250.000 words.
And yes: I am 20 times bigger. That is why I cannot believe it. Or they use some insane completely different structure. I will have to check it out…

edit: ah, i missed your previous post. No… I also don’t want to stare at Rust. I will focus on my engine for now and keep smart gaddags for later.

Using standard pointers will bloat the structure. The original implementation rolled its own pointers. If memory serves from 30 years ago, 22 bits were used for pointing within its own large array, while the other 10 bits encoded other essential information. I recall having about 100K words encoded. I believe the SPE article provides lots of clues about this.

Also, there are lots of duplicate nodes, so FSA minimization is very effective.

Depending on the lexicon, 22 bit addresses might be insufficient for the unminimized version, so you may have to use standard pointers and then minimize it before hand-rolling a small structure that can easily stay in memory.

Steven Gordon, PhD

1 Like