Mysql source code learning-less simple Hash_MySQL

Source: Internet
Author: User
Mysql source code learning-less simple Hash bitsCN.com

Hash linked lists are commonly used to map different values to different locations, the traditional sequential traversal or binary search is not required to reduce the query time. A regular hash is a pre-defined bucket. a hash function is defined and then hashed. However, there is no fixed bucket for hash in Mysql, and the hash function also changes dynamically. This article will not go into detail.

Basic struct

The Hash struct Definition and related function interfaces are defined in the include/hash. h and mysys/hash. c files. The following is the definition of the HASH struct.

Typedef struct st_hash {

Size_t key_offset, key_length;/* Length of key if const length */

Size_t blength;

Ulong records;

Uint flags;

DYNAMIC_ARRAY array;/* Place for hash_keys */

My_hash_get_key get_key;

Void (* free) (void *);

CHARSET_INFO * charset;

} HASH;

The member name indicates the offset of the key when key_offsethash is used. if the hash function is not specified, the length of the meaningful key_lengthkey is used to calculate the key value blength, which is an important auxiliary structure. The initial value is 1 and changes dynamically, it is used for hash function calculation. here it is interpreted as bucket length (not actually the actual number of buckets) whether the actual number of records in records flags allows the same element. The value is HASH_UNIQUE (1) or the user-defined hash function of the array get_key of the 0array storage element can be a NULLfree destructor or a NULLcharset character set.

The HASH struct contains a dynamic array struct DYNAMIC_ARRAY, which is described here. It is defined inInclude/my_sys.h.

Typedef struct st_dynamic_array

{

Uchar * buffer;

Uint elements, max_element;

Uint alloc_increment;

Uint size_of_element;

} DYNAMIC_ARRAY;

The member name indicates a continuous address space in the buffer, which is used to store data. it can be considered as an array space. The maximum number of elements in max_element is alloc_increment. when the number of elements reaches the upper limit, that is, when the buffer is full, extended length of each element of size_of_element according to alloc_increment

Initialization function

The Hash initialization function provides two external functions: my_hash_init and my_hash_init2. The difference is whether growth_size is defined (used to set alloc_increment of DYNAMIC_ARRAY ). The code is in mysys/hash. c.

# Define my_hash_init (A, B, C, D, E, F, G, H )/

_ My_hash_init (A, 0, B, C, D, E, F, G, H)

# Define my_hash_init2 (A, B, C, D, E, F, G, H, I )/

_ My_hash_init (A, B, C, D, E, F, G, H, I)

/**

@ Brief Initialize the hash

@ Details

Initialize the hash, by defining and giving valid values

Its elements. The failure to allocate memory for

Hash-> array element will not result in a fatal failure.

Dynamic array that is part of the hash will allocate memory

As required during insertion.

@ Param [in, out] hash The hash that is initialized

@ Param [in] charset The charater set information

@ Param [in] size The hash size

@ Param [in] key_offest The key offset for the hash

@ Param [in] key_length The length of the key used in

The hash

@ Param [in] get_key get the key for the hash

@ Param [in] free_element pointer to the function that

Does cleanup

@ Return inidicates success or failure of initialization

@ Retval 0 success

@ Retval 1 failure

*/

My_bool

_ My_hash_init (HASH * hash, uint growth_size, CHARSET_INFO * charset,

Ulong size, size_t key_offset, size_t key_length,

My_hash_get_key get_key,

Void (* free_element) (void *), uint flags)

{

DBUG_ENTER ("my_hash_init ");

DBUG_PRINT ("enter", ("hash: 0x % lx size: % u", (long) hash, (uint) size ));

Hash-> records = 0;

Hash-> key_offset = key_offset;

Hash-> key_length = key_length;

Hash-> blength = 1;

Hash-> get_key = get_key;

Hash-> free = free_element;

Hash-> flags = flags;

Hash-> charset = charset;

DBUG_RETURN (my_init_dynamic_array_ci (& hash-> array,

Sizeof (HASH_LINK), size, growth_size ));

}

As you can see, the _ my_hash_init function mainly initializes the HASH struct and hash-> array (DYNAMIC_ARRAY struct ).

Dynamic HASH function

Let's first look at the definition of the hash function:

Static inline char *

My_hash_key (const HASH * hash, const uchar * record, size_t * length,

My_bool first)

{

If (hash-> get_key)

Return (char *) (* hash-> get_key) (record, length, first );

* Length = hash-> key_length;

Return (char *) record + hash-> key_offset;

}

Static uint my_hash_mask (my_hash_value_type hashnr, size_t buffmax,

Size_t maxlength)

{

If (hashnr & (buffmax-1) <maxlength) return (hashnr & (buffmax-1 ));

Return (hashnr & (buffmax> 1)-1 ));

}

My_hash_key parameter description hashHASH linked list structure record with inserted element value length first auxiliary parameter

"3">

The my_hash_mask parameter describes the actual number of blengthmaxlength buckets in the buffmaxhash struct calculated by hashnrmy_hash_key.

You may ask me how to have two? In fact, this is similar to what we usually use. The first function my_hash_key is to calculate the Hash Key based on our values. generally, after calculation, we perform a modulo operation on the hash key, so that the calculation result is in our bucket. That is, the result of my_hash_key is used as the first input parameter of my_hash_mask. In fact, it is very easy to understand here. the only thing that hurts me is the implementation of my_hash_mask. the calculation result is related to the second and third parameters, that is, blength and records in the Hash structure. Dynamic changes, I will go ..

I am confused here. I went online through various Baidu and Google, and finally found a reply from Mysql Expert:

Hi!

"Yan" = Yan Yu Writes:

Yan> Dear MySQL experts:

Yan> Thank you so much for your reply to my previous Qs, they are very

Yan> helpful!

Yan> cocould someone please help me understand function my_hash_insert ()

Yan> in mysys/hash. cc?

Yan> what are lines 352-429 trying to achieve? Are they just some

Yan> optimization to shuffle existing

Yan> hash entries in the table (since the existing hash entries may be in

Yan> the wrong slot due to chaining

Yan> in the case of hash collision )?

The hash algorithm is based on dynamic hashing without empty slots.

This means that when you insert a new key, in some cases a small set

Of old keys needs to be moved to other buckets. This is what the code

Does.

Regards,

Monty

The focus of the Red comment is the dynamic hash. in the past, Dynamic hash was first heard of a paper titled dynamic Hash Tables on the internet. The basic principles are illustrated below.

The essence of dynamic Hash is the design of Hash functions. the dynamic hash function given in the figure is only an example mentioned in the paper. Next we will explain the hash insert in Mysql -- my_hash_insert

Non-In-depth analysis of my_hash_insert

First, the source code of my_hash_insert is provided. the code is in mysys/hash. c.

My_bool my_hash_insert (HASH * info, const uchar * record)

{

Int flag;

Size_t idx, halfbuff, first_index;

My_hash_value_type hash_nr;

Uchar * UNINIT_VAR (ptr_to_rec), * UNINIT_VAR (ptr_to_rec2 );

HASH_LINK * data, * empty, * UNINIT_VAR (gpos), * UNINIT_VAR (gpos2), * pos;

If (HASH_UNIQUE & info-> flags)

{

Uchar * key = (uchar *) my_hash_key (info, record, & idx, 1 );

If (my_hash_search (info, key, idx ))

Return (TRUE);/* Duplicate entry */

}

Flag = 0;

If (! (Empty = (HASH_LINK *) alloc_dynamic (& info-> array )))

Return (TRUE);/* No more memory */

Data = dynamic_element (& info-> array, 0, HASH_LINK *);

Halfbuff = info-> blength> 1;

Idx = first_index = info-> records-halfbuff;

If (idx! = Info-> records)/* If some records */

{

Do

{

Pos = data + idx;

Hash_nr = rec_hashnr (info, pos-> data );

If (flag = 0)/* First loop; Check if OK */

If (my_hash_mask (hash_nr, info-> blength, info-> records )! = First_index)

Break;

If (! (Hash_nr & halfbuff ))

{/* Key will not move */

If (! (Flag & LOWFIND ))

{

If (flag & HIGHFIND)

{

Flag = LOWFIND | HIGHFIND;

/* Key shall be moved to the current empty position */

Gpos = empty;

Ptr_to_rec = pos-> data;

Empty = pos;/* This place is now free */

}

Else

{

Flag = LOWFIND | LOWUSED;/* key isn't changed */

Gpos = pos;

Ptr_to_rec = pos-> data;

}

}

Else

{

If (! (Flag & LOWUSED ))

{

/* Change link of previous LOW-key */

Gpos-> data = ptr_to_rec;

Gpos-> next = (uint) (pos-data );

Flag = (flag & HIGHFIND) | (LOWFIND | LOWUSED );

}

Gpos = pos;

Ptr_to_rec = pos-> data;

}

}

Else

{/* Key will be moved */

If (! (Flag & HIGHFIND ))

{

Flag = (flag & LOWFIND) | HIGHFIND;

/* Key shall be moved to the last (empty) position */

Gpos2 = empty; empty = pos;

Ptr_to_rec2 = pos-> data;

}

Else

{

If (! (Flag & HIGHUSED ))

{

/* Change link of previous hash-key and save */

Gpos2-> data = ptr_to_rec2;

Gpos2-> next = (uint) (pos-data );

Flag = (flag & LOWFIND) | (HIGHFIND | HIGHUSED );

}

Gpos2 = pos;

Ptr_to_rec2 = pos-> data;

}

}

}

While (idx = pos-> next )! = NO_RECORD );

If (flag & (LOWFIND | LOWUSED) = LOWFIND)

{

Gpos-> data = ptr_to_rec;

Gpos-> next = NO_RECORD;

}

If (flag & (HIGHFIND | HIGHUSED) = HIGHFIND)

{

Gpos2-> data = ptr_to_rec2;

Gpos2-> next = NO_RECORD;

}

}

/* Check if we are at the empty position */

Idx = my_hash_mask (rec_hashnr (info, record), info-> blength, info-> records + 1 );

Pos = data + idx;

If (pos = empty)

{

Pos-> data = (uchar *) record;

Pos-> next = NO_RECORD;

}

Else

{

/* Check if more records in same hash-nr family */

Empty [0] = pos [0];

Gpos = data + my_hash_rec_mask (info, pos, info-> blength, info-> records + 1 );

If (pos = gpos)

{

Pos-> data = (uchar *) record;

Pos-> next = (uint) (empty-data );

}

Else

{

Pos-> data = (uchar *) record;

Pos-> next = NO_RECORD;

Movelink (data, (uint) (pos-data), (uint) (gpos-data), (uint) (empty-data ));

}

}

If (++ info-> records = info-> blength)

Info-> blength + = info-> blength;

Return (0 );

}

The dynamic hash function is also provided as follows:

Static uint my_hash_mask (my_hash_value_type hashnr, size_t buffmax,

Size_t maxlength)

{

If (hashnr & (buffmax-1) <maxlength) return (hashnr & (buffmax-1 ));

Return (hashnr & (buffmax> 1)-1 ));

}

It can be seen that the hash function is a modulo operation between the hash key and buffmax. the buffmax is the blength in the HASH structure. it can be seen from the last few lines of code in my_hash_insert: info-> blength + = info-> blength; its initial value is 1, that is, blength = 2 ^ n, and blengh is always greater than records. The basic meaning of this dynamic hash function is key % (2 ^ n ). The dynamic hash function is still Illustrated.

The hash function is basically clear, but the specific implementation of mysql is worth exploring. The reply also mentioned without empty slots. Yes, it is implemented by allocating the number of buckets based on the actual data volume. Let me talk about the code process here (if you are interested, you need to think carefully ).

  1. Judge whether the Hash is unique based on the flag. if it is a unique Hash, check whether there is a duplicate value (dupliacate entry) in the Hash table. If yes, an error is returned.
  2. Split the bucket, corresponding to the if (idx! = Info-> records) branch. This branch is a bit confusing. a slight note: gpos and ptr_to_rec indicate the data to be moved at a low level. gpos2 and ptr_to_rec2 indicate only the data to be moved at a high level. LOWFIND indicates that the low position has a value, and LOWUSED indicates whether the low position has been adjusted. The macro meanings of HIGH are basically the same. If (! (Hash_nr & halfbuff) is used to determine whether the hash value is high or low.
  3. Calculates the bucket number corresponding to the new value and inserts it. If an element exists at this position (normally, unless it is empty, the probability is relatively small), adjust the position of the original element.

Excerpted with no code in mind

BitsCN.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.