Http://blog.huang-wei.com/2010/07/20/%e5%8f%8c%e6%95%b0%e7%bb%84%e5%ad%97%e5%85%b8%e6%a0%91%e7%9a%84%e5%86%85%e5%ad%98%e5%8d%a0%e7%94%a8%e6%b5%8b%e8%af%95/
The previous article introduced the dual array dictionary tree datrie. Now let's simply test the memory usage.
Test Case. I chose the Holy Bible, and the data file size is 4.2 MB. Only English words are recorded, and all words are converted to lowercase letters.
I have made some optimizations to the implementation of trie. Initially, the size of the pointer array for each node is 0. When a node is inserted, the array of max (size, char) size is opened. Trie-MEM shows that the node size has been removed, that is, the value reflects the total size of the requested pointer array.
Trie-MEM/PTR-size/nodes = 9.1, indicating that each node (inner node + leaf node) is allocated an average of 9.1 pointers. The trie tree has saved a lot of space. However, this is obviously not accurate enough to calculate the amount of waste. nodes should be replaced with the number of inner nodes (Here we use U-words to replace leaf nodes, although the two are not the same ), because the leaf node does not allocate a pointer array, and the actually useful transfer edge should be subtracted. This waste value should be (Trie-MEM/PTR-size-nodes)/(nodes-u-words) = 12.8.
Datrie's waste value should be (datrie-MEM/(2 * int-size)-nodes)/(nodes-u-words)-1 = 1.2, it can be seen that the space complexity of datrie is quite good. Of course, I have not thoroughly optimized the implementation of datrie, And it is basically a test of the Code in the previous article. If the optimization method mentioned in the article continues to be optimized, the waste of space will be lower.
However, there is a big problem with datrie, that is, its space is pre-applied, because there is no way to determine its actual size. If the space is not large enough, re-allocate it, it will inevitably consume time, and it still cannot solve the problem of sufficient space. In addition, it is recommended that the additional information fields be saved as pointers. Otherwise, the replication complexity may be high during rearrangement.
In summary, datrie is suitable for engineering applications, especially for fixed datasets.