[Erlang0068] Erlang dict

Source: Internet
Author: User
Dict is a dictionary implemented by a dynamic hash table. the interface is consistent with orddict. The implementation is similar to the idea of dynamic expansion of array. Compared with proplists and orddict, it supports larger data volumes, you can switch from orddict to dict when the data volume expands. dict uses the dynamic Hash technology implementation, the theoretical basis is the paper: "The Design and Implementation of Dynamic Hashing for sets and tables in icon", the paper address: http://www.2007.cccg.ca /~ Morin/teaching/5408/refs/a99133 arrays are easy to address, but difficult to insert and delete; difficult to address the linked list, and easy to insert and delete; the time for inserting and deleting hash tables depends on the search time. the hash table establishes a definite functional relationship between the data and the data storage location, so it achieves efficient query efficiency. In linear tables and trees, the position of data items in the structure is random, there is no definite relationship with the data item values. In this structure, data items are searched based on "comparison", and the search efficiency depends on the number of comparisons. segment, slot, and bucket are the same concept in the hash table of Wikipedia:
The hash function is used to transform the key into the index ( Hash) Of an array element ( SlotOr Bucket) Where the corresponding value is to be sought.
In the implementation of dict, segment, slot, and bucket are three concepts that gradually become smaller. We can see their relationship from fetch:
fetch(Key, D) ->    Slot = get_slot(D, Key),    Bkt = get_bucket(D, Slot),    try fetch_val(Key, Bkt)    catch           badarg -> erlang:error(badarg, [Key, D])    end.%% get_slot(Hashdb, Key) -> Slot.%%  Get the slot.  First hash on the new range, if we hit a bucket%%  which has not been split use the unsplit buddy bucket.get_slot(T, Key) ->    H = erlang:phash(Key, T#dict.maxn),    if     H > T#dict.n -> H - T#dict.bso;     true -> H    end.%% get_bucket(Hashdb, Slot) -> Bucket.get_bucket(T, Slot) -> get_bucket_s(T#dict.segs, Slot).

 

The segment size is fixed. You only need to modify the size of the top-layer tuple as the data size continues. the last element of segments tuple is an empty segment for subsequent extension. segments scales exponentially at a time, which does not seriously damage the performance. note that the interface exposed by dict does not contain the actual location information of the data. store/3, append/3, append_list/3, update/3, update/4, update_counter/3 All check whether expansion is required,

Filter/2 Erase/2 checks whether to scale down. Because dict can dynamically adjust and scale as the data volume changes, it takes into account the memory consumption and access efficiency.

% Note: mk_seg/1 must be changed too if seg_size is changed. -Define (seg_size, 16 ). -Define (max_seg, 32 ). -Define (expand_load, 5 ). -Define (contract_load, 3 ). -Define (exp_size ,(? Seg_size *? Expand_load).-Define (con_size ,(? Seg_size *? Contract_load). % define a hashtable. The default values are the standard ones.-record (dict, {size = 0% number of elements n =? Seg_size % Number of activated slots maxn =? Seg_size % maximum slots BSO =? Seg_size Div 2% maximum Bucket number in the hash list currently allows the maximum number of buckets. The expansion operation requires you to determine whether to add a new bucket segment. The initial value is 16; exp_size =? Exp_size % the initial expansion threshold value is 16*5 = 80 con_size =? Con_size % The initial shrinkage threshold value is 16*3 = 48 empty: tuple (), % empty segment segs: tuple () % segments where all data is stored }).

When a dict is created, empty is initialized to become a data template.

New ()-> Empty = mk_seg (? Seg_size), # dict {empty = empty, segs = {empty }}. mk_seg (16)-> {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}. % 16 is also the test experience

 

K-V format-define (Kv (K, v), [k | V]). the key-value storage of % key-Value Pair format dict is not an improper list. Let's look at the implementation of append_bkt below. I guess the purpose of this operation is to treat bag as a whole.
Eshell V5.9.1  (abort with ^G)1> dict:new().{dict,0,16,16,8,80,48,      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}2> dict:store(k,v,v(1)).{dict,1,16,16,8,80,48,      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},      {{[],[],[],[],[],[],[],[],[],[],[],[[k|v]],[],[],[],[]}}}3> dict:store(k2,v2,v(2)).{dict,2,16,16,8,80,48,      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},      {{[],[],        [[k2|v2]],        [],[],[],[],[],[],[],[],        [[k|v]],        [],[],[],[]}}}4>

 

%% append_bkt(Key, Val, Bucket) -> {NewBucket,PutCount}.append_bkt(Key, Val, [?kv(Key,Bag)|Bkt]) -> {[?kv(Key,Bag ++ [Val])|Bkt],0};append_bkt(Key, Val, [Other|Bkt0]) ->    {Bkt1,Ic} = append_bkt(Key, Val, Bkt0),    {[Other|Bkt1],Ic};append_bkt(Key, Val, []) -> {[?kv(Key,[Val])],1}.%% app_list_bkt(Key, L, Bucket) -> {NewBucket,PutCount}.app_list_bkt(Key, L, [?kv(Key,Bag)|Bkt]) -> {[?kv(Key,Bag ++ L)|Bkt],0};app_list_bkt(Key, L, [Other|Bkt0]) ->    {Bkt1,Ic} = app_list_bkt(Key, L, Bkt0),    {[Other|Bkt1],Ic};app_list_bkt(Key, L, []) -> {[?kv(Key,L)],1}.

 

When should you use gb_trees over dicts? Well, it's not a clear demo. as the benchmark Module I have written will show, gb_trees and dicts have somewhat similar performances in each respects. however, the benchmark demonstrates that dicts have the best read speeds while the gb_trees tend to be a little quicker on other operations. you can judge based on your own needs which one wocould be the best.

Oh and also note that while dicts have a fold function, gb_trees don't: They instead haveIteratorFunction, which returns a bit of the tree on which you can callgb_trees:next(Iterator)To get the following values in order. what this means is that you need to write your own recursive functions on top of gb_trees rather than use a generic fold. on the other hand, gb_trees let you have quick access to the smallest and largest elements of the structuregb_trees:smallest/1Andgb_trees:largest/1.

Link: http://learnyousomeerlang.com/a-short-visit-to-common-data-structures

For more information, see the following article [1] Erlang dictionary examplehttp: // abel-perez.com/erlang-dictionary-example [2] working with dictionaries in erlanghttp: // www.techrepublic.com/article/working-with-dictionaries-in-erlang/6342630 Wikipedia http://en.wikipedia.org/wiki/Hash_table of hash table

 

 

Good night!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.