Operating system Hash table Linux Kernel application analysis

Last Update:2015-08-18 Source: Internet

Author: User

Tags array length prev

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Basic Concepts

a hash table, also known as a hash table, is a data structure that is accessed directly from a key value. That is, it accesses records by mapping key code values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array that holds the record is called the hash table.

2. Common methods for constructing hash functions
The hash function makes access to a data series more efficient, and the data elements are positioned more quickly through the hash function. Common ways to construct a hash table are:
(1) Direct addressing method
(2) Digital analysis method
(3) The method of taking the square
(4) Folding method
(5) Random number method
(6) In addition to the residue remainder method
3. Methods of dealing with conflicts
Hash table functions are designed so that conflicts can be reduced, but conflicts are not completely avoided. Common conflict handling methods are:
(1) Open addressing method
(2) Re-hashing method
(3) Chain address method (Zipper method)
(4) Establishing a public overflow area
4. Hash table Lookup Performance Analysis
The lookup process for a hash table is basically the same as the watchmaking process. Some key codes can be found directly through the address of the hash function transformation, and some key codes have conflicts on the address of the hash function and need to be searched by the method of dealing with conflicts. In the three methods described for dealing with conflicts, post-conflict lookups are still the process of comparing a given value to a key code. Therefore, the measurement of the efficiency of the hash table is still measured by the average lookup length.
  In the process of searching, the number of key code comparisons depends on how many conflicts are generated, the conflict is less, the search efficiency is high, the conflict is more, and the search efficiency is low. Therefore, the factors that affect the number of conflicts, that is, the factors that affect the search efficiency. There are three factors that affect the number of conflicts:
1. The hash function is uniform;
2. Methods of dealing with conflicts;
3. Reload factor for the hash table.
  The reload factor for the hash list is defined as: α= the number of elements in the table/the length of the hash list.
  α is the marker factor for the full extent of the hash table. Since the length of the table is fixed, α is proportional to the number of elements in the table, so the larger the alpha, the more elements are filled in the table, the more likely the conflict will be, and the smaller the alpha, the less likely it will be to have a conflict. In fact, the average lookup length of a hash table is a function of filling factor α, but different methods of dealing with conflicts have different functions.
I. Linux kernel hash table data structure
  The most important thing is to choose the appropriate hash function, so that the average allocation of keywords in the bucket position, so as to optimize the time to find the insertion and deletion. However, any hash function can have conflicting problems. The kernel uses the method of resolving the hash conflict by: Zipper method The method of resolving conflicts is to link all keywords as synonyms to the same linked list. If the hash list length selected is M, the hash list can be defined as an array of pointers (struct Hlist_head name) consisting of the M-head pointer, t[0..m-1]. All nodes with hash address I are inserted into the linked list with T[i] as the head pointer. The initial value of each component in T should be a null pointer. In the Zipper method, the filling factor α (the number of elements of the filling/array length) can be greater than 1, but generally take α≤1. Of course, using the Zipper method to solve the hash conflict is also flawed, the pointer needs extra space.
1. Its code is in Include/linux/list.h, and the 3.0 kernel places its data structure definition in Include/linux/types.h
Data structure definition of hash table:

struct hlist_head{
struct Hlist_node *first;
}
struct Hlist_node {
struct Hlist_node *next,**pprev;

}

1>hlist_head represents the head node of the hash table. Each entry (list_entry) in the hash table corresponds to a linked list (hlist). The hlist_head struct has only one domain, that is, first. The first pointer points to the initial node of the Hlist list.
The 2>HLIST_NODE structure has two domains, next and Pprev.
(1) Next points to the next Hlist_node node, if the change node is the last one in the list, next points to null

(2) Pprev is a level two pointer that points to the next pointer of the previous node.

The hlist (hash table) and list in 2.Linux are not the same, each node in the list is the same, regardless of the head node or other nodes, using the same struct, but in hlist, the head node uses the struct hlist_head to represent the , and for other nodes, the result of Strcuct Hlist_node is used to represent this data. And list is a two-way loop linked list, and Hlist is not a two-way circular linked list. Because there are no prev variables in the Hlist header node. Why design it like this?

The purpose of the hash table is to facilitate quick lookups, so the hash table is usually a larger array, otherwise the probability of "collision" is very large, so that the meaning of the hash table is lost. How can we maintain a large table and not consume too much memory? At this point only one pointer can be placed in the structure of each entry (header node) of the hash table. This saves half of the pointer space, especially if the hash bucket is large. (if there are two pointer fields that will occupy 8 bytes of space)

The 3.hlist node has two pointers, but Pprev is a pointer to the next pointer to the previous node, why use Pprev, and two do not use a first-level pointer?
Since Hlist is not a complete circular chain list, the table header and node are the same data structure, and the prev is OK directly. In Hlist, there is no prev in the table header, only one first.
1> in order to be able to uniformly modify the first pointer of the table header, the first pointer of the header must be modified to point to the newly inserted node, Hlist has designed Pprev. The Pprev of a list node no longer refers to the pointer to a forward node, but to the next (or first) pointer in the forward one (possibly the header), thus allowing for a consistent node-> in the operation of the table header insert. Pprev access and modify the next (or first) pointer to the previous node.

2> also solves the inconsistency of the data structure, Hlist_node clever Pprev point to the previous node's next pointer address, because Hlist_head and Hlist_node point to the next node pointer type is the same, it solves the universality.

Two. Declaration and initialization macros for a hash table
1. Initializing the hash table header node
In fact, struct Hlist_head only defines a linked list node, and does not specifically define a linked header, you can use three macros as follows
#define Hlist_head_init {. First = NULL}
#define HLIST_HEAD (name) struct Hlist_head name = {. First = NULL}
#define Init_hlist_head (PTR) ((Ptr->first) =null))
1>name is a struct-body variable of struct struct hlist_head{}.
2>hlist_head_init macros are initialized only
Eg:struct Hlist_head my_hlist = Hlist_head_init
Call Hlist_head_init to initialize the My_hlist hash header node only, pointing the Fist of the header node to null.
The 3>hlist_head (name) function macro is both declared and initialized.
Eg:hlist_head (my_hlist);
Call the HLIST_HEAD function macro to declare and initialize the My_hlist hash header node. Points the fist of the header node to null.
4>hlist_head macros are statically initialized at compile time and can also be initialized at run time using Init_hlist_head
Eg:
Init_hlist_head (&my_hlist);
Call Init_hlist_head to initialize the my_hlist and point its first domain to null.
2. Initialization of hash table nodes
1>linux provides an interface for hash table node initialization:
static iniline void Init_hlist_node (struct hlist_node *h)
(1) H: For hash Table nodes
2> implementation:
static inline void Init_hlist_node (struct hlist_node *h)
{
H->next = NULL;
H->pprev = NULL;
}

The embedded function is implemented to initialize the struct Hlist_node node, the next field and the Pprev are pointed to null, and the initialization operation is realized.

three. Basic operation of hash list (insert, delete, empty)
1. Determine if the hash list is empty
1>function: The function determines whether the hash list is empty and returns 1 if it is empty. otherwise return 0
2> function Interface:
static inline int hlist_empty (const struct Hlist_head *h)
H: The head node that points to the hash list.
3> function Implementation:
static inline int hlist_empty (const struct Hlist_head *h)
{
Return!h->first;
}
Determine if it is empty by judging the first field in the head node. If first is empty, it indicates that the hash list is empty.
2. Determine if the node is in the hash table
1>function: Determine if the node already exists in the hash table.
2> function Interface:
static inline int hlist_unhashed (const struct Hlist_node *h)
H: Point to the hash list node
3> function Implementation:
static inline int hlist_unhashed (const struct Hlist_node *h)
{
Return!h->pprev
}
Determine whether the node is in a hash list by judging whether the pprev of the node is empty. The H->pprev is equivalent to the next field of the previous node of the H node. If the next field of the previous node is empty, the change node is not in the hash list.
3. Hash list Delete operation
1>function: Deletes a node from the hash list.
2> function Interface:
static inline void Hlist_del (struct hlist_node *n)
N: A linked list node pointing to the Hlist
static inline void Hlist_del_init (struct hlist_node *n)
N: A linked list node pointing to the Hlist
3> function implementation
static inline void __hlist_del (struct hlist_node *n)
{
struct Hlist_node *next = n->next;
struct Hlist_node **pprev = n->pprev;
*pprev = Next;
if (next)
Next->pprev = Pprev;
}
STEP1: First get the next node of N next
Step2:n->pprev the address of the next pointer to the previous node of N, so that the *pprev represents the address of the next junction of N (which currently points to n itself).
Step3:*pprev=next, which associates the previous node of N with the next junction of N.
STEP4: If n is the last node of the linked list, then N->next is empty and no action is required; otherwise, Next->pprev=pprev, the Pprev of the next node of n points to the Pprev of N (both the Pprev value of the post-node is modified)
At this point, we can assume that a single-level pointer is used in the Hlist_node, so how do we do it?
At this point in the STEP3 operation, it is necessary to determine whether the node is the head node. You can use N->prev to differentiate between the head node and the normal node nulll.
struct My_hlist_node *next = n->next;
struct My_hlist_node *prev = N->prev;
if (N->prev)
N->prev->next = Next;
Else
N->prev = NULL;
if (next)
Next->prev = prev;
then why not do the above operation?
(1) The code is not concise enough. With Hlist_node nodes, the head node and the normal node are consistent;
static inline void Hlist_del (struct hlist_node *n)
{
__hlist_del (n);
N->next = List_poison1;
N->pprev = List_poison2;
}
STEP1: Call __hlist_del (n) to delete the hash list node n (that is, modify the relationship between the previous node of N and the latter node)
Step2 and STEP3: points to the next and Pprev fields of the n node, respectively, to List_poison1 and List_poison2. This is set to ensure that nodes that are not in the linked list cannot be accessed.
static inline void Hlist_del_init (struct hlist_node *n)
{
if (!hlist_unhashed (n)) {
__hlist_del (n);
Init_hlist_node (n);
}
}
STEP1: First determine if the node is in the hash list, and if it is not, delete it. If yes then take the second step
STEP2: Call __hlist_del Delete node n
STEP3: Call Init_hlist_node to initialize node n.
Description
Both Hlist_del and Hlist_del_init call __hlist_dle to delete the node n. The only difference is the processing of the node n, which is set to n is not available, and the latter is set to an empty node.

4. Add a hash node

1>function: Adds a node to the hash list.
Hlist_add_head: The node n is inserted after the head node H.
Hlist_add_before: node n is inserted in front of next node (next in hash list)
Hlist_add_after: The node next is inserted after N (n in the hash list)
The Hlist_add_fake function is newly added to the 3.0 kernel.
The 2>linux core provides three interfaces:
static inline void Hlist_add_head (struct hlist_node *n, struct hlist_head *h)
struct Hlist_node *n:n is the hash node that will be inserted
The struct hlist head *h:h is the header node of the hash list.
static inline void Hlist_add_before (struct hlist node *n,struct hlist_node *next)
struct Hlist node *n:n is the hash node that will be inserted.
struct Hlist node *next:next is the hash node in the original hash list.
static inline void Hlist_add_after (struct hlist node *n,struct hlist_node *next)
struct hlist node *n:n and hash nodes in the original hash list
struct Hlist node *next:next is the hash node that will be inserted
Note: A new hlist_add_fake is added to the 3.0 kernel
static inline void Hlist_add_fake (struct hlist_node *n)
struct Hlist_node *n:n linked list hash node
3> function Implementation:
static inline void Hlist_add_head (struct hlist_node *n,struct hlist_head *h)
{
struct Hlist_node *first = h->first;
N->next = First;
if (first)
First->pprev = &n->next;
H->first = n;
N->pprev = &h->first;
}
Step1:first = H->first. Gets the first node of the current list.
STEP2: Assigns first to the next field of the N-node. Let the next of n be associated with first.
STEP3: If first is not empty, the pprev of first is pointed to the next field of N. The Association of the first node is completed at this point.
If the fist is empty, no action is made.
Step4:h->first = n; Points the fist field of the head node to n, making n the first node of the list.
Step5:n->pprev = &h->first; The Pprev of the N node points to the fist field of the linked list, and the association of N nodes is completed at this time.

/*next must be!=null*/

static inline void Hlist_add_before (struct hlist_node *n, struct hlist_node *next)
{
N->pprev = next->pprev;
N->next = Next;
Next->pprev = &n->next;
* (N->pprev) =n;
}
Step1:n->pprev = next->prev; assigns the Pprev of next to N->pprev. The Pprev of n points to next on the previous node of next.
Step2:n->next = next; point the N node next to next and complete the association of N nodes.
Step3:next->pprev = &n->next; modifies the pprev of the next node so that it points to the next address of N. The link to the next node is complete at this point.
STEP4: * (N->pprev) =n; At this time * (N->pprev) is the N node in front of the next, so that it points to N. Completes the association of N nodes.
Note:
(1) Next cannot be empty (next is the node in the hash list).
(2) n is the newly inserted node.
static inline void Hlist_add_after (struct hlist_node *n, struct hlist_node *next)
{
Next->next = n->next;
N->next = Next;
Next->pprev = &n->next;
if (Next->next)
Next->next->pprev = &next->next;
}
n is the node in the original hash list, next the newly inserted node. After inserting the node next into N (next is the newly inserted node)
Step1:next->next = n->next; Points the Next->next to the next node of node N.
Step2:n->next = Next; Modify the next of the n node so that n points to next.
Step3:next->pprev = &n->next; Point Next's Pprev to the next of N
STEP4: Determine if the node after next is empty if it is empty, otherwise the pprev of next post node will point to its next place.
static inline void Hlist_add_fake (struct hlist_node *n)
{
N->pprev =&n->next;
}

The meaning of this function is not very clear, looking at the expert pointing.

Three. Other operations of the hash list
1. The movement of the hash list
1>function: The header node of the hash chat table is replaced with the new node, and the previous head node is deleted.
2> Interface:
static inline void hlist_move_list (struct hlist_head *old, struct hlist_head *new)
struct Hlist_head *old: The head node of the original hash list
struct Hlist_head *new: The head node of the newly replaced hash list
3> implementation:
static inline void hlist_move_list (struct hlist_head *old, struct hlist_head *new)
{
New->first = old->first;
if (New->first)
New->fist->pprev = &new->first;
Old->first = NULL;
}
STEP1: Points The first node of the new node
STEP2: Determines whether there is a hash node after the link header node. If it is empty, it is not manipulated. Otherwise, the pprev of the first node after the header points to the primary of the new header node.

STEP3: Points The original hash link header node to null.

Four. Traversal of a hash list
To facilitate the core application traversal of the linked list, the Linux list abstracts the traversal operations into several macros. Before parsing a traverse macro, analyze how to access the data items we need from the linked list
1.hlist_entry (Ptr,type,member)
1>function: Pointer to the entire struct through the member pointer
Only the address of the Hlist_head member variable in the data item structure is saved in the Linux list, which can be accessed by the Hlist_entry macro through Hlist_head members to the node data that is its owner.
2> Interface:
Hlist_entry (Ptr,type,member)
Ptr:ptr is a pointer to the Hlist_head member in the data structure that stores the address value of the linked list in the data structure.
Type: Is the kind of the data structure.
Member: Variable name of the Hlist_head member in the data item type definition.
Implementation of 3&GT;HLIST_ENTRY macro
#define Hlist_entry (PTR, type, member)
Container_of (PTR, type, member)
HLIST_ENTRY macro calls the Container_of macro, see the usage of the CONTAINER_OF macro:
2. Traversal operations
1>function: It's actually a for loop that iterates through it from beginning to end. Because Hlist is not a circular linked list, the loop termination condition is that POS is not empty. You cannot delete the POS when traversing with Hlist_for_each (must be guaranteed to be pos->next valid), or it will cause SIGSEGV errors. With Hlist_for_each_safe, you can delete operations while traversing.
2> Interface:
The Linux kernel provides two interfaces for a hash list traversal:
Hlist_for_each (Pos,head)
Pos:pos is a secondary pointer (that is, a linked list type struct Hlist_node) for a linked list traversal
Head: The header pointer of the list (that is, member struct hlist_head in the struct).
Hlist_for_each_safe (Pos,n,head)
Pos:pos is a secondary pointer (that is, a linked list type struct Hlist_node) for a linked list traversal
-N is a temporary hash node pointer (struct Hlist_node) that is used to temporarily store the next linked table node for POS.
Head: The header pointer of the list (that is, member struct hlist_head in the struct).
3> function Implementation:
(1) #define Hlist_for_each (POS, head)
for (pos = (head)->first; pos; pos = pos->next)
POS is a secondary pointer, and POS starts at the first hash node and does not have access to the Hashito node until the POS is empty when the loop ends.
(2) #define HLIST_FOR_EACH_SAFE (Pos,n,head)
for (pos = (head)->first,pos && ({n=pos->next;1;}); pos=n)
Hlist_for_each is done by moving the POS pointer to achieve the purpose of traversal. However, if the traversed operation contains the node to which the POS pointer is deleted, the movement of the POS pointer will be interrupted because Hlist_del (POS) will place the next, prev of the POS into the special values of List_position2 and List_position1. Of course, the caller can cache the next pointer on its own so that the traversal can be coherent, but for programmatic consistency, the LINXU kernel hash list requires the caller to provide an additional pointer to the same type of POS N, to stage the address of the next node in the POS for the For Loop, Avoid broken chains caused by the release of POS nodes.
This loop determines the condition for POS && ({n = pos->next;1;});
This statement first determines whether the POS is empty, and if it is empty, does not continue to judge. If the POS is true ({n=pos->next;1;}) -The statement is a compound statement expression whose value is the last statement, that is, the statement is always true, and the value of the next node of the post is assigned to N. That is, the loop judgment condition only determines whether the POS is true, if true, then continue to judge.

({n-pos->next;1;} This is a GCC-specific C extension, and if you don't understand it, you can refer to the GCC extension

Five. Use a struct address outside the list to traverse without using the address of the hash list
Linux provides three ways to traverse from the first hash node of the hash list, and the second is to iterate from the next node of the POS node in the hash list, and the third is to iterate from the current node in the hash list.
1. Traversal from the first hash node of the hash list
1>function: The traversal begins at the first hash node of the hash list. Hlist_for_each_entry cannot delete the POS while traversing (must be guaranteed to be pos->next valid), otherwise it will cause SIGSEGV error. With Hlist_for_each_entry_safe, you can delete operations while traversing.
2>linux provides two interfaces to enable traversal from the first node of the Hashtable.
Hlist_for_each_entry (TPOs, POS, head, member)
TPOs: A pointer to traverse, except that its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, except that its data type is strut hlist_head type
Head: The header node of the hash table
Member: The variable name of the Hlist_head member in the data item type definition
Hlist_for_each_entry_safe (TPOs, POS, N, head, member)
TPOs: A pointer to traverse, except that its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, except that its data type is strut hlist_head type
N: The temporary pointer is used for the next pointer to store POS, and its data type is also a struct hlist_list type
Head: The header node of the hash table
Member: The variable name of the Hlist_head member in the data item type definition
3> implementation
#define HLIST_FOR_EACH_ENTRY (Tpos,pos,head,member)
for (pos = (head)->first;
POS &&
({TPOs = Hlist_entry (pos, typeof (*tpos), member); 1;});
pos = pos->next)

#define HLIST_FOR_EACH_ENTRY_SAFE (TPOs, POS, N, head, member)
for (pos = (head)->first;
Pos && ({n = pos->next;1;}) &&
({TPOs = Hlist_entry (pos, typeof (*tpos), member); 1;});
pos = N)
2. Start traversing from the next node of the POS node in the hash list
1>function: Traverse from the next node of the POS node.
2> function Interface:
Hlist_for_each_entry_continue (TPOs, POS, member)
TPOs: A pointer to traverse, except that its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, except that its data type is strut hlist_head type
Member: The variable name of the Hlist_head member in the data item type definition
3> function Implementation:
#define HLIST_FOR_EACH_ENTRY_CONTINUE (TPOs, POS, member)
for (pos = (POS)->next;
POS &&
({TPOs = Hlist_entry (pos,typeof (*tpos), member); 1;});
pos = pos->next)
3. Start traversing from the current node of the POS node in the hash list
1>function: The traversal begins at a node at the current point. Hlist_for_entry_continue is a traversal that begins after a node.
2> function Interface:
Hlist_for_each_entry_from (TPOs, POS, member)
TPOs: A pointer to traverse, except that its data type is struct type instead of strut hlist_head type
POS: A pointer to traverse, except that its data type is strut hlist_head type
Member: The variable name of the Hlist_head member in the data item type definition
3> implementation
#define HLIST_FOR_EACH_ENTRY_FROM (TPOs, POS, member)
for (; Pos &&
({TPOs = Hlist_entry (pos,typeof (*tpos), member); 1;});
pos = pos->next)

Operating system Hash table Linux Kernel application analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More