Test the memory usage of the double array dictionary tree

Source: Internet
Author: User

 

Http://blog.huang-wei.com/2010/07/20/%e5%8f%8c%e6%95%b0%e7%bb%84%e5%ad%97%e5%85%b8%e6%a0%91%e7%9a%84%e5%86%85%e5%ad%98%e5%8d%a0%e7%94%a8%e6%b5%8b%e8%af%95/

 

 

The previous article introduced the dual array dictionary tree datrie. Now let's simply test the memory usage.

Test Case. I chose the Holy Bible, and the data file size is 4.2 MB. Only English words are recorded, and all words are converted to lowercase letters.

words : 822,529
u-words : 12,591
nodes : 34,266
trie-mem : 1,247,308
datrie-mem : 483,376

I have made some optimizations to the implementation of trie. Initially, the size of the pointer array for each node is 0. When a node is inserted, the array of max (size, char) size is opened. Trie-MEM shows that the node size has been removed, that is, the value reflects the total size of the requested pointer array.

Trie-MEM/PTR-size/nodes = 9.1, indicating that each node (inner node + leaf node) is allocated an average of 9.1 pointers. The trie tree has saved a lot of space. However, this is obviously not accurate enough to calculate the amount of waste. nodes should be replaced with the number of inner nodes (Here we use U-words to replace leaf nodes, although the two are not the same ), because the leaf node does not allocate a pointer array, and the actually useful transfer edge should be subtracted. This waste value should be (Trie-MEM/PTR-size-nodes)/(nodes-u-words) = 12.8.

Datrie's waste value should be (datrie-MEM/(2 * int-size)-nodes)/(nodes-u-words)-1 = 1.2, it can be seen that the space complexity of datrie is quite good. Of course, I have not thoroughly optimized the implementation of datrie, And it is basically a test of the Code in the previous article. If the optimization method mentioned in the article continues to be optimized, the waste of space will be lower.

However, there is a big problem with datrie, that is, its space is pre-applied, because there is no way to determine its actual size. If the space is not large enough, re-allocate it, it will inevitably consume time, and it still cannot solve the problem of sufficient space. In addition, it is recommended that the additional information fields be saved as pointers. Otherwise, the replication complexity may be high during rearrangement.

In summary, datrie is suitable for engineering applications, especially for fixed datasets.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.