Operating system Hash table Linux Kernel application analysis

Source: Internet
Author: User

1. Basic Concepts

Hash list (hash table. Also called a hash table). is a data structure that is directly visited based on key value.

That is, it visits records by mapping key code values to a location in the table. To speed up the search.

This mapping function is called a hash function. The array that holds the records is called the hash table.

2. Frequently used methods for constructing hash functions
Hash functions can make the access to a data series more efficient and effective. Through the hash function. Data elements will be positioned more quickly. The commonly used construction methods for hash lists are:
(1) Direct addressing method
(2) Digital analysis method
(3) The method of taking the square
(4) Folding method
(5) Random number method
(6) In addition to the residue remainder method
3. Methods of dealing with conflicts
Hash table functions are designed so that conflicts can be reduced, but conflicts cannot be avoided altogether. Common conflict handling methods are:
(1) Open addressing method
(2) Re-hashing method
(3) Chain address method (Zipper method)
(4) Establishing a public overflow area
4. Hash table Lookup Performance Analysis
The hash table lookup process is essentially the same as the watchmaking process.

Some key codes can be found directly through the address of the hash function transformation, and there are some key codes in the hash function to get the address of the conflict, you need to deal with the method of conflict to find.

In the three methods of dealing with conflicts, post-conflict lookups are still the process of comparing a given value to a key code. So, a measure of the efficiency of the hash table lookup. is still measured by the average lookup length.


The number of key codes in the lookup process. Depending on how much conflict is generated, the resulting conflict is small and the search efficiency is high. Resulting in more conflicts, the search efficiency is low. Therefore, the factors that affect the number of conflicts, that is, the factors that affect the search efficiency.

There are three factors that affect the number of conflicts:
1. The hash function is uniform;
2. Methods of dealing with conflicts.
3. Reload factor for the hash table.


The reload factor for the hash list is defined as: α= the number of elements in the table/the length of the hash list.
α is the marker factor for the full extent of the hash table. Because the table length is a fixed value. α is proportional to the number of elements in the table, so the greater the alpha. More elements are filled into the table, and the likelihood of conflict is greater. The smaller the alpha, the fewer elements are filled in the table, and the less likely it is to create a conflict. In fact, the average lookup length of a hash table is a function of filling factor α, and only different methods of dealing with conflicts have different functions.
I. Linux kernel hash table data structure
The most important thing is to choose the appropriate hash function, so that the average allocation of keyword in the bucket position, so as to optimize the time to find the insertion and deletion. However, no matter what hash function there is a conflict problem.

The kernel uses the method of resolving the hash conflict by: Zipper Method Zipper Method To resolve the conflict is: link all keyword as synonyms in the same linked list. If the hash list length selected is M, the hash list can be defined as an array of pointers (struct Hlist_head name) consisting of the M-head pointer, t[0..m-1]. A node with a hash address of I. are inserted into the linked list with T[i] as the head pointer. The initial value of each component in T should be a null pointer. In the Zipper method, the filling factor α (the number of elements loaded or the length of the array) can be greater than 1. But generally take α≤1. Of course. Using the Zipper method to solve the hash conflict is also flawed, the pointer needs extra space.


1. Its code is in Include/linux/list.h, and the 3.0 kernel places its data structure definition in Include/linux/types.h
Data structure definition of hash table:


struct hlist_head{
struct Hlist_node *first;
}
struct Hlist_node {
struct Hlist_node *next,**pprev;

}


1>hlist_head represents the head node of the hash table. Each entry (list_entry) in the hash table corresponds to a linked list (hlist). The hlist_head struct has only one domain. That is first. The first pointer points to the initial node of the Hlist list.


The 2>HLIST_NODE structure has two domains. Next and Pprev.
(1) Next points to the next Hlist_node node, if the nodes are the last of the linked list. Next points to null

(2) Pprev is a level two pointer. It points to the next pointer of the previous node.


The Hlist (hash table) and list in 2.Linux are not the same. Each node in the list is the same, regardless of the head node or other nodes. expressed using the same struct body. But in the hlist. The head node is represented by the struct hlist_head, and the data result of the Strcuct Hlist_node is used for the other nodes.

And list is a two-way loop linked list, and Hlist is not a two-way circular linked list. Because there are no prev variables in the Hlist header node. Why design it like this?

The purpose of the hash table is to facilitate high-speed lookups, so the hash table is generally a larger array, otherwise the probability of "collision" is very large, so that the meaning of the hash table is lost. How to do to maintain a large table, but also can not occupy too much memory?

Only one pointer can be stored in its structure for each entry (header node) of a hash table. By doing so, you can save half the pointer space. Especially in the case of a very large hash bucket. (assuming that there are two pointer fields that will occupy 8 bytes of space)


The 3.hlist node has two pointers. But Pprev is a pointer pointer, it points to the previous node of the next pointer, why the use of Pprev, two do not use a first-level pointer?
Because Hlist is not a complete circular list, the table header and node are the same data structure in the list. It is OK to use prev directly. In the hlist. There is no prev in the table header, just one first.
1> in order to be able to change the first pointer of the table header uniformly, the first pointer of the header must be changed to point to the newly inserted node. Hlist designed the Pprev.

The Pprev of a list node no longer refers to the pointer to a forward node, but to the next (or first) pointer in the forward one (possibly the header), thus allowing for a consistent node-> in the operation of the table header insert. Pprev Access and change the previous node's next (or first) pointer.

2> also overcomes the inconsistent data structure, Hlist_node cleverly pprev the address of the next pointer to the previous node, because Hlist_head and Hlist_node point to the same pointer type of the next node. It overcomes the versatility.


Two. Declaration and initialization macros for a hash table
1. Initializing the hash table header node

In fact, struct Hlist_head simply defines a linked list node and does not specifically define a linked header, but can use the following three macros, for example
#define Hlist_head_init {. First = NULL}
#define HLIST_HEAD (name) struct Hlist_head name = {. First = NULL}
#define Init_hlist_head (PTR) ((Ptr->first) =null))
1>name is a struct-body variable of struct struct hlist_head{}.
2>HLIST_HEAD_INIT macro is only initialized
Eg:struct Hlist_head my_hlist = Hlist_head_init
Call Hlist_head_init to initialize the My_hlist hash header node only, pointing the Fist of the header node to null.
The 3>hlist_head (name) function macro is both declared and initialized.


Eg:hlist_head (my_hlist);
Call the HLIST_HEAD function macro to declare and initialize the My_hlist hash header node. Points the fist of the header node to null.


4>hlist_head macros are statically initialized at compile time and can be initialized with Init_hlist_head at execution time
Eg:
Init_hlist_head (&my_hlist);
Call Init_hlist_head both to initialize the My_hlist and point its first domain to null.


2. Initialize the hash table node
1>linux provides an interface for hash table node initialization:
static iniline void Init_hlist_node (struct hlist_node *h)
(1) H: For hash Table nodes
2> implementation:
static inline void Init_hlist_node (struct hlist_node *h)
{
H->next = NULL;
H->pprev = NULL;
}

The embedded function is implemented to initialize the struct Hlist_node node, and its next field and Pprev are pointed to null. Implementation of its initialization operation.


three. Basic operation of hash list (insert, delete, empty)
1. Infer If the hash list is empty

1>function: The function infers whether the hash list is empty, and if NULL, returns 1. otherwise return 0
2> function Interface:
static inline int hlist_empty (const struct Hlist_head *h)
H: The head node that points to the hash list.
3> function Implementation:
static inline int hlist_empty (const struct Hlist_head *h)
{
Return!h->first;
}
Infer whether the first field in the head node is empty by inferring it.

Assuming that first is empty indicates that the hash list is empty.
2. Infer if the node is in the hash table
1>function: Infer If the node already exists in the hash table.
2> function Interface:
static inline int hlist_unhashed (const struct Hlist_node *h)
H: Point to the hash list node
3> function Implementation:
static inline int hlist_unhashed (const struct Hlist_node *h)
{
Return!h->pprev
}
Infers whether the node is in a hash list by inferring whether the pprev of the node is empty. The H->pprev is equivalent to the next field of the previous node of the H node. Assume that the next field of the previous node is empty. Description The node is not in the hash list.


3. Hash list Delete operation
1>function: Deletes a node from the hash list.
2> function Interface:
static inline void Hlist_del (struct hlist_node *n)
N: A linked list node pointing to the Hlist
static inline void Hlist_del_init (struct hlist_node *n)
N: A linked list node pointing to the Hlist
3> function implementation
static inline void __hlist_del (struct hlist_node *n)
{
struct Hlist_node *next = n->next;
struct Hlist_node **pprev = n->pprev;
*pprev = Next;
if (next)
Next->pprev = Pprev;
}
STEP1: First get the next node of N next
Step2:n->pprev the address of the next pointer to the previous node of N, so that the *pprev represents the address of the next junction of N (now pointing to n itself).
Step3:*pprev=next, which associates the previous node of N with the next junction of N.


Step4: Suppose N is the last node of a linked list. Then N->next is empty, no matter what the operation, otherwise, Next->pprev=pprev, the Pprev of the next node of n points to the Pprev of N (both the Pprev value of the changed node)
At this point, if we can use a single-level pointer in the Hlist_node, then how to operate it?
At this point in the STEP3 operation, it is necessary to infer whether the node is the head node.

Can be used to n->prev whether the nulll to distinguish between the head node and the normal node.
struct My_hlist_node *next = n->next;
struct My_hlist_node *prev = N->prev;
if (N->prev)
N->prev->next = Next;
Else
N->prev = NULL;
if (next)
Next->prev = prev;
Then why not do the above operation?
(1) The code is not concise enough. Use Hlist_node nodes. The head node and the common node are consistent;
static inline void Hlist_del (struct hlist_node *n)
{
__hlist_del (n);
N->next = List_poison1;
N->pprev = List_poison2;
}
STEP1: Call __hlist_del (n) to delete the hash list node n (that is, change the relationship between the previous node of N and the latter node)
Step2 and STEP3: points to the next and Pprev fields of the n node, respectively, to List_poison1 and List_poison2.

This is set to ensure that nodes not in the list cannot be interviewed.


static inline void Hlist_del_init (struct hlist_node *n)
{
if (!hlist_unhashed (n)) {
__hlist_del (n);
Init_hlist_node (n);
}
}
STEP1: First infer whether the node is in the hash list, assuming it is not, delete it.

Let's say it's the second step.
STEP2: Call __hlist_del Delete node n
STEP3: Call Init_hlist_node to initialize node n.
Description
Both Hlist_del and Hlist_del_init call __hlist_dle to delete the node n.

The only difference is the processing of node n, which is set to n is not available. The latter is to set it to an empty node.


4. Join the hash node

1>function: Adds a node to the hash list.
Hlist_add_head: The node n is inserted after the head node H.
Hlist_add_before: node n is inserted in front of next node (next in hash list)
Hlist_add_after: The node next is inserted after N (n in the hash list)
The 3.0 kernel adds a new Hlist_add_fake function.
The 2>linux core provides three interfaces:
static inline void Hlist_add_head (struct hlist_node *n, struct hlist_head *h)
struct Hlist_node *n:n is the hash node that will be inserted
The struct hlist head *h:h is the header node of the hash list.
static inline void Hlist_add_before (struct hlist node *n,struct hlist_node *next)
struct Hlist node *n:n is the hash node that will be inserted.
struct Hlist node *next:next is the hash node in the original hash list.


static inline void Hlist_add_after (struct hlist node *n,struct hlist_node *next)
struct hlist node *n:n and hash nodes in the original hash list
struct Hlist node *next:next is the hash node that will be inserted
Note: New Hlist_add_fake is added to the 3.0 kernel
static inline void Hlist_add_fake (struct hlist_node *n)
struct Hlist_node *n:n linked list hash node
3> function Implementation:
static inline void Hlist_add_head (struct hlist_node *n,struct hlist_head *h)
{
struct Hlist_node *first = h->first;
N->next = First;
if (first)
First->pprev = &n->next;
H->first = n;
N->pprev = &h->first;
}
Step1:first = H->first.

Gets the first node of the current list.
STEP2: Assigns first to the next field of the N-node. Let the next of n be associated with first.
STEP3: Assuming first is not empty, the pprev of first is pointed to the next field of N. This completes the association of the first node.
Suppose fist is empty. The operation is not performed.
Step4:h->first = n; Points the fist field of the head node to n, making n the first node of the list.


Step5:n->pprev = &h->first; The Pprev of the N node points to the fist field of the linked list, at which point the Association of N nodes is completed.


/*next must be!=null*/

static inline void Hlist_add_before (struct hlist_node *n, struct hlist_node *next)
{
N->pprev = next->pprev;
N->next = Next;
Next->pprev = &n->next;
* (N->pprev) =n;
}
Step1:n->pprev = next->prev; assigns the Pprev of next to N->pprev. The Pprev of n points to next on the previous node of next.
Step2:n->next = Next; The n node's next is pointed to next, and the N node is associated.
Step3:next->pprev = &n->next; Changes the Pprev of next node at this point. The address of the next that makes it point to N. The link to the next node is complete at this point.


STEP4: * (N->pprev) =n; At this time * (N->pprev) is the N node in front of the next, so that it points to N. The Association of N nodes is complete.
Note:
(1) Next cannot be empty (next is the node in the hash list).
(2) n is the newly inserted node.


static inline void Hlist_add_after (struct hlist_node *n, struct hlist_node *next)
{
Next->next = n->next;
N->next = Next;
Next->pprev = &n->next;
if (Next->next)
Next->next->pprev = &next->next;
}
n is the node in the original hash list, next the newly inserted node.

After inserting the node next into N (next is the newly inserted node)
Step1:next->next = n->next; Points the Next->next to the next node of node N.
Step2:n->next = Next; Change the N node's next. Make n point to next.
Step3:next->pprev = &n->next; Point Next's Pprev to the next of N
STEP4: Infer if the node after next is null hypothesis. Null is not manipulated, otherwise the pprev of next post node points to its next.


static inline void Hlist_add_fake (struct hlist_node *n)
{
N->pprev =&n->next;
}

The meaning of this function is not very clear, looking at the expert pointing.


three. Other operations of the hash list
1. The movement of the hash list
1>function: Replaces the header node of a hash chat table with the new node. Delete the previous head node.
2> Interface:
static inline void hlist_move_list (struct hlist_head *old, struct hlist_head *new)
struct Hlist_head *old: The head node of the original hash list
struct Hlist_head *new: The head node of the newly replaced hash list
3> implementation:
static inline void hlist_move_list (struct hlist_head *old, struct hlist_head *new)
{
New->first = old->first;
if (New->first)
New->fist->pprev = &new->first;
Old->first = NULL;
}
STEP1: Points The first node of the new node
STEP2: Infers whether there is a hash node after the chain header node.

If NULL is assumed, no action is given. Otherwise, the pprev of the first node after the header points to the primary of the new header node.

STEP3: Points The original hash link header node to null.


Four. Traversal of a hash list
To facilitate the core application traversal of the linked list, the Linux list abstracts the traversal operations into several macros. Before analyzing traversing macros, analyze how to access the data items we need from the linked list
1.hlist_entry (Ptr,type,member)
1>function: Pointer to the entire struct through the member pointer
Only the address of the Hlist_head member variable in the data item structure is saved in the Linux linked list, and through the Hlist_entry macro The Hlist_head members are able to access the node data of all of its people.
2> Interface:
Hlist_entry (Ptr,type,member)
Ptr:ptr is a pointer to the Hlist_head member in the data structure that stores the address value of the linked list in the data structure.


Type: Is the kind of the data structure.


Member: Variable name of the Hlist_head member in the data item type definition.
Implementation of 3>HLIST_ENTRY macro
#define Hlist_entry (PTR, type, member)
Container_of (PTR, type, member)
HLIST_ENTRY macro called the CONTAINER_OF macro, about the use of container_of macros see:
2. Traversal operations
1>function: is actually a for loop. Traverse from beginning to end.

Because Hlist is not a circular list, the loop termination condition is that POS is not empty.

You cannot delete the POS when traversing with Hlist_for_each (must be guaranteed to be pos->next valid), or it will cause SIGSEGV errors.

With Hlist_for_each_safe, you can delete operations while traversing.
2> Interface:
The Linux kernel provides two interfaces for a hash list traversal:
Hlist_for_each (Pos,head)
Pos:pos is a secondary pointer (that is, a linked list type struct Hlist_node) for a linked list traversal
Head: The header pointer of the list (that is, member struct hlist_head in the struct).
Hlist_for_each_safe (Pos,n,head)
Pos:pos is a secondary pointer (that is, a linked list type struct Hlist_node) for a linked list traversal
-N is a temporary hash node pointer (struct Hlist_node) that is used to temporarily store the next linked table node of the POS.
Head: The header pointer of the list (that is, member struct hlist_head in the struct).
3> function Implementation:
(1) #define Hlist_for_each (POS, head)
for (pos = (head)->first; pos; pos = pos->next)
POS is a secondary pointer, and POS starts with the first hash node and does not have access to the Hashito node. Ends the loop until the POS is empty.
(2) #define HLIST_FOR_EACH_SAFE (Pos,n,head)
for (pos = (head)->first,pos && ({n=pos->next;1;}); pos=n)
Hlist_for_each is done by moving the POS pointer to achieve the purpose of traversal. However, assuming that the traversed operation includes deleting the node pointed to by the POS pointer, the movement of the POS pointer will be interrupted, since Hlist_del (POS) will place the Pos's next, prev into the special values of List_position2 and List_position1. Of course, the caller can cache the next pointer by itself so that the traversal can be coherent. But for programmatic consistency, the LINXU kernel hash list requires the caller to provide an additional pointer n to the same type as the POS. The address of the next node in the POS is staged in the for loop to avoid the broken chain caused by the POS node being released.
This loop infers the condition for POS && ({n = pos->next;1;});
This statement first infers whether POS is empty and assumes null to not proceed with inference.

Assume that the POS is true ({n=pos->next;1;}) -"The statement is a compound statement expression whose value is the last statement, that is, the statement is always true." and assigns the value of the next node of the post to N. That is, the loop infers that the condition only infers whether POS is true, assuming true, and then continues to infer downward.

({n-pos->next;1;} This is a GCC-specific C extension. If you don't understand, you can take a test of GCC extensions.


Five. Use a struct address outside the list to traverse without using the address of the hash list
Linux provides traversal from three ways, one from the first hash node of the hash list, and the other from the next node of the POS node in the hash list. The third is to iterate from the current node in the hash list.
1. Start with the first hash node of the hash list
1>function: Iterates from the first hash node of a hash list. Hlist_for_each_entry cannot delete the POS while traversing (must be guaranteed to be pos->next valid), otherwise it will cause SIGSEGV error.

With Hlist_for_each_entry_safe, you can delete operations while traversing.
2>linux provides two interfaces for traversing from the first node of a Hashtable
Hlist_for_each_entry (TPOs, POS, head, member)
TPOs: A pointer to traverse only if its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, just its data type is strut hlist_head type
Head: The header node of the hash table
Member: The variable name of the Hlist_head member in the data item type definition
Hlist_for_each_entry_safe (TPOs, POS, N, head, member)
TPOs: A pointer to traverse only if its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, just its data type is strut hlist_head type
N: The temporary pointer is used for the next pointer to store POS, and its data type is also a struct hlist_list type
Head: The header node of the hash table
Member: The variable name of the Hlist_head member in the data item type definition
3> implementation
#define HLIST_FOR_EACH_ENTRY (Tpos,pos,head,member)
for (pos = (head)->first;
POS &&
({TPOs = Hlist_entry (pos, typeof (*tpos), member); 1;});
pos = pos->next)

#define HLIST_FOR_EACH_ENTRY_SAFE (TPOs, POS, N, head, member)
for (pos = (head)->first;
Pos && ({n = pos->next;1;}) &&
({TPOs = Hlist_entry (pos, typeof (*tpos), member); 1;});
pos = N)
2. Start traversing from the next node of the POS node in the hash list
1>function: Traverse from the next node of the POS node.
2> function Interface:
Hlist_for_each_entry_continue (TPOs, POS, member)
TPOs: A pointer to traverse only if its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, just its data type is strut hlist_head type
Member: The variable name of the Hlist_head member in the data item type definition
3> function Implementation:
#define HLIST_FOR_EACH_ENTRY_CONTINUE (TPOs, POS, member)
for (pos = (POS)->next;
POS &&
({TPOs = Hlist_entry (pos,typeof (*tpos), member); 1;});
pos = pos->next)
3. Start traversing from the current node of the POS node in the hash list
1>function: The traversal begins at the current node. Hlist_for_entry_continue is a traversal that starts after a node.
2> function Interface:
Hlist_for_each_entry_from (TPOs, POS, member)
TPOs: A pointer to traverse only if its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, just its data type is strut hlist_head type
Member: The variable name of the Hlist_head member in the data item type definition
3> implementation
#define HLIST_FOR_EACH_ENTRY_FROM (TPOs, POS, member)
for (; Pos &&
({TPOs = Hlist_entry (pos,typeof (*tpos), member); 1;});
pos = pos->next)

Operating system Hash table Linux Kernel application analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.