Nginx Foundation. Nginx Basic Hash table

Last Update:2018-07-26 Source: Internet

Author: User

Tags goto hash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Based on previous learning experiences, such as the hash table in the STL, using the Open chain method, vector+list as a container, when the total number of elements in the hashtable exceeds a certain number, select expand Vector.
Another example is the hash table in Libevent, similar to the hash table in the STL, but more complex, each bucket may have a linked list, each linked list element may also have a linked list. But it's not complicated to understand.
The hash table in Nginx now looks distinctly different from the one discussed above.

In Nginx, the hash structure is used to store server_name and ngx_http_core_srv_conf_t mappings.
Configure Server_names_hash_max_size to control the maximum number of buckets, server_names_hash_bucket_size can control the size of each bucket
The following figure is the hash structure used when Nginx stores a server_name with no wildcard characters:

Before you explain it, you need to mention that the hash table in Nginx is special in that it is statically read-only, that is, it is not possible to add new elements dynamically at runtime, and all the structure and data are already planned at the time of configuration initialization, so the "init" process is Very large performance impact on run-time lookups we assume that the hash table has a size, and the hash value for each element is [hash%size], as you can see from the graph above, that this hashtable resolves conflicts like a chain, but does not use a linked list to store the same [hash%siz E] Value, instead, the element with the same [hash%size] value is stored in a contiguous amount of memory, ending with a null pointer. We talked about two configurations that can control the maximum number of buckets and the size of each bucket, but it doesn't say how much of this hash table bucket is. Imagine that the current size of our buckets is determined, if there is too much conflict, that is, multiple elements are stored in the same bucket, which will certainly lead to a bucket "overloaded." The solution is to expand the number of buckets, which is the value of size (this will not only make the original element with the same [hash%size] no longer the same, but also make the conflict more sparse). See if you can guarantee that buckets that hold multiple elements of the same [hash%size] are not "overloaded." But since we have set the maximum number of buckets, we return an error once the size reaches this value that does not meet our requirements.
At the same time, the size of each bucket must be guaranteed to hold at least one element. No matter what the size of this element is.
Well, here's a general introduction. The following is a detailed code analysis.

I. Definition of data structure
1. Elements of the basic hash table
Each element is in the form of a key-value.

typedef struct {
    void             *value;            That is the corresponding value
    u_short len in Key-value           ;              For the length of key in Key-value
    U_char            name[1];          is the first address of the key. Using an array of length 1 is for the future application of Len size space is continuous (please search for "0 or 1 length array")
} ngx_hash_elt_t;

2. Basic hash Table structure

typedef struct {
    ngx_hash_elt_t  **buckets;          That is, the hash table
    ngx_uint_t        size;             The number of buckets in the hash table
} ngx_hash_t;

3. Hash table structure with wildcard character support
In fact, there is an extra value pointer that can be used to point to user data when using ngx_hash_wildcard_t wildcard hash table as a container element.
Not a long explanation here.

typedef struct {
    ngx_hash_t        hash;
    void             *value;
} ngx_hash_wildcard_t;

4, the structure is mainly used to save the data to hash, that is, key-value pairs <key,value>
In practice, multiple key-value pairs are typically stored in an array of ngx_hash_key_t structures, passed as a parameter to the Ngx_hash_init () or ngx_hash_wildcard_init () function
Used to represent the elements that will be added to the hash table

typedef struct {
    ngx_str_t         key;
    ngx_uint_t        Key_hash;         The value computed by the hash function based on key. The structure that this element represents in the future is inserted bucket[key_hash% size]
    void             *value;
} ngx_hash_key_t;

5. Structure for initialization of hash table

typedef struct {
    ngx_hash_t       *hash;
    Ngx_hash_key_pt   key;              That is, the hash function

    ngx_uint_t        max_size;         The maximum number of buckets
    ngx_uint_t        bucket_size;      Capacity of each bucket

    char             *name;             Log with
    ngx_pool_t       *pool;             is the ngx_pool_t *temp_pool for memory processing in the pool       ;
} ngx_hash_init_t;

The relationship between the above 5 structures must be clear, so attach a picture in advance. If you do not understand the diagram, it does not matter, in short, in the above structure between the relationship has an impression, continue the following source code analysis.
The following picture is from: http://blog.csdn.net/chen19870707/article/details/40794285 (if there is an infringement, delete it immediately)

Ii. initialization of the basic hash table
There is one more thing to mention before parsing the code that initializes the hash table. Is the size of each element in each bucket. Since it uses a 1-length array strategy, it is no longer simple to use sizeof for the size of its memory.

typedef struct {
    void             *value;            That is the corresponding value
    u_short len in Key-value           ;              For the length of key in Key-value
    U_char            name[1];          is the first address of the key. Using an array of length 1 is for the future application of Len size space is continuous (please search for "0 or 1 length array")
} ngx_hash_elt_t;

So here's the macro:

#define NGX_HASH_ELT_SIZE (name)                                               \
    (sizeof (void *) + ngx_align ((name)->key.len + 2, sizeof (void *)))

The macro-sizing process is:
The size of a void pointer (to value) is added first, which is 8 bytes on a 64-bit machine. And then with the size of the key, the size of the key is expressed by Len, plus the short type of 2 bytes, but the simple addition obviously does not conform to the rules of memory address alignment.
For example, we define the following structure:

struct test{
    int x;
    Char y;
};

Then the size of the sizeof (struct test) is 8 after the compiler memory address alignment is processed. But here is our own management of memory, so the memory alignment must be done by ourselves. So since the current pointer is 8 bytes, if the key Len plus 2 is the value of 13, it should be adjusted to 16 bytes; If the value is 17, it should be resized to a size of 24 bytes.
So this macro is a smooth find out the size of the memory of an element of the value.

Here is the detailed code for hash table initialization and parsing

ngx_int_t//Parameters are://initialized with the struct//used to represent the element that will be added to the hash table, the array of key-value//The number of elements that will be added to the hash table Ngx_hash_init (ngx_hash_i
    nit_t *hinit, ngx_hash_key_t *names, ngx_uint_t nelts) {U_char *elts;
    size_t Len;
    U_short *test;
    ngx_uint_t I, N, key, size, start, bucket_size;

     ngx_hash_elt_t *elt, **buckets; The primary function of this for loop is to determine whether the capacity of each bucket configured with the configuration option can be loaded with any of the following elements (Len can be different between elements). (In other words, it is required that each bucket can have at least one element, regardless of which element it is)//if it does not, it will return an error for (n = 0; n < nelts; n++) {//The following is a judgment that NA Me[n] If the key-value is loaded into the ngx_hash_elt_t structure (the structure of each element of the hash table), then the size of this element must be less than the capacity of the bucket//Otherwise, we set the bucket capacity is too small/ /After calling the macro to add a pointer to the size, can be based on the graph given at the beginning of the article to see, each bucket will eventually have a null pointer as the end of n elements if (Hinit->bucket_size < Ngx_hash_elt_siz
                          E (&names[n]) + sizeof (void *)) {Ngx_log_error (Ngx_log_emerg, Hinit->pool->log, 0, "Could not build the%s, you should" "IncreaSe%s_bucket_size:%i ", Hinit->name, Hinit->name, hinit->bucket_size);
        return ngx_error; }}//The test below is a lot of use.
     Its size is max_size, which is the maximum number of buckets allowed. Its size indicates that the test array will be used later, which corresponds to bucket one by one.
    i.e. Bucket[i] and test[i] are related to test = Ngx_alloc (hinit->max_size * sizeof (u_short), hinit->pool->log);
    if (test = = NULL) {return ngx_error;
     The//bucket_size represents the capacity of each bucket given. Minus is the size of the last null pointer.

     This element is a sentinel element that is used to determine if the current bucket has elements bucket_size = hinit->bucket_size-sizeof (void *);
     Since we don't know the number of buckets, we will certainly start with the smallest size (less ...) What is the smallest size?
    With the address aligned, a ngx_hash_elt_t element is 2*8 at least one byte, so take this value to find the smallest size.
    Start = nelts/(Bucket_size/(2 * sizeof (void *))); Start = start?

     Start:1; According to the actual experience to make adjustments ... Unable to understand if (Hinit->max_size > 10000 && nelts && Hinit->max_size/nelts <) {Star
    t = hinit->max_size-1000; }//Now we've got the smallest possible size value. SiZe buckets can be loaded with nelts elements, assuming that each ELT has a size of only 16 bytes. But the assumption is not tenable, so you must find a more appropriate size for (size = start; size <= hinit->max_size; size++) {//This test array holds the current of each bucket capacity, if the capacity of a bucket[i] test[i] is greater than the specified maximum capacity means that the number of hash buckets needs to be enlarged size//use Memzero to set the current capacity of each bucket test[i] to 0 Ngx_memze

        RO (test, size * sizeof (u_short));
            for (n = 0; n < nelts; n++) {if (Names[n].key.data = = NULL) {continue;
                } key = names[n].key_hash% size;

                The memory footprint to be stored in Bucket[key] test[key] = (u_short) (Test[key] + ngx_hash_elt_size (&names[n]));
            Once exceeded, it is not sufficient to indicate the size of the bucket if (Test[key] > (u_short) bucket_size) {goto next;

    }} goto found;
    Next:continue;

    } size = hinit->max_size; Ngx_log_error (Ngx_log_warn, Hinit->pool->log, 0, "could not build optimal%s, you should increase
          "        "Either%s_max_size:%i or%s_bucket_size:%i;" "Ignoring%s_bucket_size", Hinit->name, Hinit->name, Hinit->max_size, hinit-&

Gt;name, Hinit->bucket_size, hinit->name);
     Found://This indicates that we have successfully found a size that satisfies the condition. Still test[i] corresponds to Bucket[i], now we require how much memory is required for all the elements in total. Finally, the memory is applied and allocated to each bucket first, we calculate the memory capacity of each bucket to be stored in the test array//At this point the null pointer needs to be counted for (i = 0; i < size; I
    + +) {Test[i] = sizeof (void *);
            }//calculates the memory capacity to be stored for each bucket, recorded in the test array for (n = 0; n < nelts; n++) {if (Names[n].key.data = = NULL) {
        Continue
        } key = names[n].key_hash% size;
    Test[key] = (u_short) (Test[key] + ngx_hash_elt_size (&names[n]));
     } len = 0; Get all the required memory for all elements, recorded in Len for (i = 0; i < size; i++) {if (test[i] = = sizeof (void *)) {continue
        ;

} Test[i] = (u_short) (Ngx_align (Test[i], ngx_cacheline_size));        Len + = Test[i];
    }//If the hash table in the initialization struct does not exist, then we need to apply it manually.
          if (Hinit->hash = = NULL) {//It is worth noting that the application here is not simply the memory of the basic hash table structure, but the wildcard hash table containing the basic hash table. The reason for this design, I think, is to satisfy the need for a future init pass with the hash table.
        Since the ngx_hash_wildcard_t contains a basic hash table and is not in any trouble to use,//then does it look better? Hinit->hash = Ngx_pcalloc (hinit->pool, sizeof (ngx_hash_wildcard_t) + si
        Ze * sizeof (ngx_hash_elt_t *));
            if (Hinit->hash = = NULL) {ngx_free (test);
        return ngx_error; }//A two-level pointer bucket defined at the beginning, pointing to an array of ngx_hash_elt_t * in the hash table. The bucket consists of an array of buckets = (ngx_hash_elt_t * *) ((U_char *) Hinit->hash + sizeof (ngx_hash_wild

    card_t));
        } else {buckets = Ngx_pcalloc (hinit->pool, size * sizeof (ngx_hash_elt_t *));
            if (buckets = = NULL) {ngx_free (test);
        return ngx_error;
    }}//apply all required elements of memory to ELTs. ELTs = Ngx_palloc (Hinit->pool, Len + ngx_cacheline_size);
        if (ELTs = = NULL) {ngx_free (test);
    return ngx_error;

     } ELTs = Ngx_align_ptr (ELTs, ngx_cacheline_size); The following allocates memory for each bucket.
    Previously, the memory size of each bucket should have been recorded with Test.
        for (i = 0; i < size; i++) {if (test[i] = = sizeof (void *)) {continue;
          } Buckets[i] = (ngx_hash_elt_t *) ELTs;

    Allocate memory according to the size of each bucket recorded ELTs + = Test[i]; }//Since each bucket has the memory it should have, then the Key-value data will now be moved in//is still test[i] corresponding to bucket[i].
     The test array at this time is used to record how much memory has been initialized for the current bucket.
    If this element has been moved to this bucket, the next element's first address is starting with the current element's first address plus test[i].
    for (i = 0; i < size; i++) {test[i] = 0;
        } for (n = 0; n < nelts; n++) {if (Names[n].key.data = = NULL) {continue;
        } key = names[n].key_hash% size;

        ELT = (ngx_hash_elt_t *) ((U_char *) Buckets[key] + test[key]);
        Elt->value = Names[n].value; Elt->len = (u_short) names[N].key.len;

        Copy the uppercase letters to lowercase ngx_strlow (elt->name, Names[n].key.data, Names[n].key.len) at the same time;
    Test[key] = (u_short) (Test[key] + ngx_hash_elt_size (&names[n])); }//The null pointer is added at the end of each bucket.
    When dealing with it, think of it as a ngx_hash_elt_t structure, the first element in the structure is exactly a void pointer, we only deal with it, nothing else touches, so there is no cross-border problem.
        for (i = 0; i < size; i++) {if (buckets[i] = = NULL) {continue;

        } ELT = (ngx_hash_elt_t *) ((U_char *) buckets[i] + test[i]);
    Elt->value = NULL;

    } ngx_free (test);
    Hinit->hash->buckets = buckets;

    hinit->hash->size = size;
return NGX_OK; }

For memory allocation, memory address alignment, Cacheline alignment I'm still a little confused, so some places don't explain. Mainly I am not a solid foundation. Share.

third, the search of the basic hash table.

The value  u
void *
ngx_hash_find (ngx_hash_t *hash, ngx_uint_t) that the key corresponds to is found in the hash table that the Key,name,len information is pointing to. Key, U_char *name, size_t len)
{
    ngx_uint_t       i;
    ngx_hash_elt_t  *elt;

    ELT = hash->buckets[key% hash->size];

    if (ELT = = null) {
        return null;
    }

     Search for the bucket. Until the value in the NGX_HASH_ELT_T structure is null while
    (elt->value) {
        if (len! = (size_t) elt->len) {             //length is judged first
            goto next;
        }

        for (i = 0; i < len; i++) {
            if (name[i]! = Elt->name[i]) {         //Then compare the contents of name, you can see that the comparison here is very direct
                Goto NEXT;
  }
        }

        return elt->value;

    Next:
          //Here the address is offset to the next ngx_hash_elt_t structure
        ELT = (ngx_hash_elt_t *) ngx_align_ptr (&elt->name[0] + elt-> Len,
                                               sizeof (void *));
        Continue;
    }

    return NULL;
}

The paragraph did not explain too much, and there was no difficulty in understanding it.

Finally, a few blog posts to draw lessons from:
http://blog.csdn.net/livelylittlefish/article/details/6636229
http://blog.csdn.net/chen19870707/article/details/40794285
Http://www.linuxidc.com/Linux/2012-08/67040.htm
Http://blog.chinaunix.net/uid-27767798-id-3766755.html
Thank.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More