[Erlang0067] Erlang gb_trees

Source: Internet
Author: User
Gb_trees (General balanced trees) is a common binary query tree, which is usually used as an ordered dictionary. there is no additional storage overhead compared to a common unbalanced binary tree. The additional storage overhead here refers to whether to use additional metadata to record node-related information, dict and array use such description information. In other words, gb_trees is self-described. the performance is better than AVL trees. general balanced trees http://www.2007.cccg.ca /~ Morin/teaching/5408/refs/a99133 and proplists, orddict can support larger data volumes than it does. balanced binary tree (also known as AVL Tree), the depth of left and right Subtrees only has an absolute difference of no more than 1. this interpolation is called the balance factor. Search, insert, and delete are both O (logn) in average and worst cases ). unbalanced node insertion and deletion balancing tree will trigger rebalancing. rebalancing is achieved through four types of rotation: ll lr rr rl. because the Node Deletion operation does not increase the height of the tree, no rebalancing is performed after the node is deleted. note: gb_treess data items use equal = Operator. gb_trees Data Structure
Gb_trees = {size, tree} tree = {key, value, smaller, bigger} | nilsmaller = treebigger = tree
Gb_trees operations

 

Eshell V5.9.1  (abort with ^G)1> G=gb_trees.gb_trees2> G:empty().{0,nil}3> G:insert(k,v,G:empty()).{1,{k,v,nil,nil}}4> G:insert(k1,v1,v(3)).{2,{k,v,nil,{k1,v1,nil,nil}}}5> G:insert(k2,v3,v(4)).{3,{k,v,nil,{k1,v1,nil,{k2,v3,nil,nil}}}}6> G:insert(k0,v0,v(4)).{3,{k,v,nil,{k1,v1,{k0,v0,nil,nil},nil}}}7> G:insert(k0,v0,v(5)).{4,{k,v,nil,{k1,v1,{k0,v0,nil,nil},{k2,v3,nil,nil}}}}8> G:insert(k0,v0,v(6)).** exception error: {key_exists,k0}     in function  gb_trees:insert_1/4 (gb_trees.erl, line 321)     in call from gb_trees:insert_1/4 (gb_trees.erl, line 283)     in call from gb_trees:insert_1/4 (gb_trees.erl, line 300)     in call from gb_trees:insert/3 (gb_trees.erl, line 280)
InsertThe operation triggers rebalancing (T). The delete operation does not trigger rebalancing, and the method of rebalancing does not need to be called in most cases. The purpose of this interface is that after a large number of elements are deleted, the query time is reduced by rebalancing. insert (x, V, T) If duplicate keys are inserted, {key_exists, key} is thrown. Because the Node Deletion operation does not increase the tree height, therefore, the node is not balanced after it is deleted. note the following comparison:
Eshell V5.9.1  (abort with ^G)1> T={8,{k,v,nil,{k1,v1,{k0,v0,nil,nil},{k4,v4,{k3,v3,nil,nil},{k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}}}.{8,{k,v,nil,  {k1,v1, {k0,v0,nil,nil},   {k4,v4,  {k3,v3,nil,nil}, {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}}}2> gb_trees:delete(k1,T).{7,{k,v,nil,{k3,v3,{k0,v0,nil,nil}, {k4,v4,nil,{k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}}}3> gb_trees:balance(v(2)).{7,{k4,v4,{k0,v0,{k,v,nil,nil},{k3,v3,nil,nil}}, {k6,v6,{k5,v5,nil,nil},{k7,v7,nil,nil}}}}4>

 

Below LookupThe method is a typical process of gb_trees traversal:
lookup(Key, {_, T}) ->    lookup_1(Key, T).%% The term order is an arithmetic total order, so we should not%% test exact equality for the keys. (If we do, then it becomes%% possible that neither `>', `<', nor `=:=' matches.) Testing '<'%% and '>' first is statistically better than testing for%% equality, and also allows us to skip the test completely in the%% remaining case.lookup_1(Key, {Key1, _, Smaller, _}) when Key < Key1 ->    lookup_1(Key, Smaller);lookup_1(Key, {Key1, _, _, Bigger}) when Key > Key1 ->    lookup_1(Key, Bigger);lookup_1(_, {_, Value, _, _}) ->    {value, Value};lookup_1(_, nil) ->    none.
Lookup and GetDifferent return values:
6> gb_trees:lookup(k1,T).{value,v1}7> gb_trees:get(k1,T).v18>

 

UpdateThe method executes a similar traversal process to complete the reconstruction of gb_trees:

update(Key, Val, {S, T}) ->    T1 = update_1(Key, Val, T),    {S, T1}.%% See `lookup' for notes on the term comparison order.update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key < Key1 ->     {Key1, V, update_1(Key, Value, Smaller), Bigger};update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key > Key1 ->    {Key1, V, Smaller, update_1(Key, Value, Bigger)};update_1(Key, Value, {_, _, Smaller, Bigger}) ->    {Key, Value, Smaller, Bigger}.
EnterIt is a composite operation, which is equivalent to update_or_insert. If it exists, it will be updated. If it does not exist, it will be inserted:
enter(Key, Val, T) ->    case is_defined(Key, T) of     true ->         update(Key, Val, T);     false ->         insert(Key, Val, T)    end.
Is_defined/2, lookup/2, get/2 are all tail recursive operations, Keys(T) Values(T) two operations are not implemented by tail recursion:
keys({_, T}) ->    keys(T, []).keys({Key, _Value, Small, Big}, L) ->    keys(Small, [Key | keys(Big, L)]);keys(nil, L) -> L.values({_, T}) ->    values(T, []).values({_Key, Value, Small, Big}, L) ->    values(Small, [Value | values(Big, L)]);values(nil, L) -> L.
Smallest(T) Largest(T) as the name suggests, the smallest and largest key-value pairs, smallest/1, and largest/1 are implemented by tail recursion. Take_smallest(T): returns {X, V, T1} X, V is the key-value pair corresponding to the minimum value, and T1 is the new tree after the minimum value is removed. take_largest (t): returns {X, V, T1} is similar.
18> gb_trees:largest(T).{k7,v7}19> gb_trees:take_largest(T).{k7,v7, {7, {k,v,nil, {k1,v1,{k0,v0,nil,nil},            {k4,v4,{k3,v3,nil,nil},{k5,v5,nil,{k6,v6,nil,nil}}}}}}}20> gb_trees:smallest(T).{k,v}21> gb_trees:take_smallest(T).{k,v,  {7,  {k1,v1,  {k0,v0,nil,nil},        {k4,v4,            {k3,v3,nil,nil},            {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}}}22>
Let's look at the implementation of take_largest:
take_largest({Size, Tree}) when is_integer(Size), Size >= 0 ->    {Key, Value, Smaller} = take_largest1(Tree),    {Key, Value, {Size - 1, Smaller}}.take_largest1({Key, Value, Smaller, nil}) ->    {Key, Value, Smaller};take_largest1({Key, Value, Smaller, Larger}) ->    {Key1, Value1, Larger1} = take_largest1(Larger),    {Key1, Value1, {Key, Value, Smaller, Larger1}}.
What should I do if I want to traverse the entire tree? Gb_trees provides Iterator
12> gb_trees:next(gb_trees:iterator(T)).{k,v,   [{k0,v0,nil,nil},    {k1,v1,        {k0,v0,nil,nil},        {k4,v4,            {k3,v3,nil,nil},            {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}]}13> {Key,Value,I}=gb_trees:next(gb_trees:iterator(T)).{k,v,   [{k0,v0,nil,nil},    {k1,v1,        {k0,v0,nil,nil},        {k4,v4,            {k3,v3,nil,nil},            {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}]}14> {Key2,Value2,I2}=gb_trees:next(I).{k0,v0,    [{k1,v1,         {k0,v0,nil,nil},         {k4,v4,             {k3,v3,nil,nil},             {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}]}15>   15> gb_trees:iterator(T).[{k,v,nil,    {k1,v1,        {k0,v0,nil,nil},        {k4,v4,            {k3,v3,nil,nil},            {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}}]16> I.[{k0,v0,nil,nil},{k1,v1,     {k0,v0,nil,nil},     {k4,v4,         {k3,v3,nil,nil},         {k5,v5,nil,{k6,v6,nil,{k7,v7,nil,nil}}}}}]18>

The efficiency of traversing the entire tree using the iterator is very high, and it is only a little slower than the list traversal of the same data volume. It is very mysterious, right? The implementation is very simple:

iterator({_, T}) ->    iterator_1(T).iterator_1(T) ->    iterator(T, []).%% The iterator structure is really just a list corresponding to%% the call stack of an in-order traversal. This is quite fast.iterator({_, _, nil, _} = T, As) ->    [T | As];iterator({_, _, L, _} = T, As) ->    iterator(L, [T | As]);iterator(nil, As) ->    As.

 

The mochiweb_headersmochiweb project's mochiweb_headers uses gb_trees implementation:
%% @spec enter(key(), value(), headers()) -> headers()%% @doc Insert the pair into the headers, replacing any pre-existing key.enter(K, V, T) ->    K1 = normalize(K),    V1 = any_to_list(V),    gb_trees:enter(K1, {K, V1}, T).%% @spec insert(key(), value(), headers()) -> headers()%% @doc Insert the pair into the headers, merging with any pre-existing key.%%      A merge is done with Value = V0 ++ ", " ++ V1.insert(K, V, T) ->    K1 = normalize(K),    V1 = any_to_list(V),    try gb_trees:insert(K1, {K, V1}, T)    catch        error:{key_exists, _} ->            {K0, V0} = gb_trees:get(K1, T),            V2 = merge(K1, V1, V0),            gb_trees:update(K1, {K0, V2}, T)    end.%% @spec delete_any(key(), headers()) -> headers()%% @doc Delete the header corresponding to key if it is present.delete_any(K, T) ->    K1 = normalize(K),    gb_trees:delete_any(K1, T).

 

When should you use gb_trees over dicts? Well, it's not a clear demo. as the benchmark Module I have written will show, gb_trees and dicts have somewhat similar performances in each respects. however, the benchmark demonstrates that dicts have the best read speeds while the gb_trees tend to be a little quicker on other operations. you can judge based on your own needs which one wocould be the best.

Oh and also note that while dicts have a fold function, gb_trees don't: They instead haveIteratorFunction, which returns a bit of the tree on which you can callgb_trees:next(Iterator)To get the following values in order. what this means is that you need to write your own recursive functions on top of gb_trees rather than use a generic fold. on the other hand, gb_trees let you have quick access to the smallest and largest elements of the structuregb_trees:smallest/1Andgb_trees:largest/1.

Link: http://learnyousomeerlang.com/a-short-visit-to-common-data-structures

Can you answer the following questions?

Q: Why does mochiweb_headers use gb_tree as the storage structure? Why not dict or other data structures?

Good night!

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.