Linux kernel hash lookup (1)

Last Update:2015-09-21 Source: Internet

Author: User

Tags mathematical functions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the kernel, the lookup is essential, such as the kernel manages so many user processes, now to quickly locate
A process, which needs to be looked up, and there are multiple virtual stores in the address space of a process, and the kernel is fast
Locate a virtual storage area of the process address space, which you need to find, and so on. One of the most used is tree-based
Find--------> Red and Black trees. and calculation-based lookup-------> hash lookups. The search for both is highly efficient and
Adapts to the kernel, while the lookup--------> binary lookup based on linear tables, although high efficiency but not suitable for the kernel
In this case, it is almost impossible to use an array to manage some data in the current version of the kernel, which is primitive. and two points
The lookup must use an array, so the kernel does not need a binary lookup.
--------------------------------------------------------------------------------
1, initialization of kernel hash lookup. (for fast lookup of processes)

/*
* The PID hash table is scaled according to the amount of memory in the
* Machine. From a minimum of slots up to 4096 slots at one gigabyte or
* More.
*/
void __init pidhash_init (void)
{
int I, pidhash_size;
Set up space for an array
Pid_hash = Alloc_large_system_hash ("PID", sizeof (*pid_hash), 0, 18,
hash_early | Hash_small,
&pidhash_shift, NULL, 4096);
Pidhash_size = 1 << pidhash_shift;//length of array
for (i = 0; i < pidhash_size; i)
Init_hlist_head (&pid_hash[i]);//Initialize the pointers in the array to null
}

The idea of initialization is simple, which is to create a hash array (a concept that defines itself, the array in the hash table),
The array is initialized, the function is called in Start_kernel, and as for the length of the array, you can write a small
The kernel module is simple enough to read its size. Pidhash_shift is a global variable that is not exported and can be
Find its address in the kernel symbol table, cat/proc/kallsyms | grep pidhash_shift to print the value out
Know the size of the array. As can be seen from the above code, the length of this array is also dependent on the size of the machine memory, generally
Memory environment The array has a length of 4096. A small module below can read the value of Pidhash_shift.
--------------------------------------------------------------------------------------------

1 #include <linux/module.h>
2 #include <linux/kernel.h>
3 #include <linux/init.h>
4 #include <linux/moduleparam.h>
5 #include <linux/list.h>
6 #include <linux/hash.h>
7
cat/proc/kallsyms | grep pidhash_shift
8 unsigned int *p_shift = (unsigned int *) 0xc176ab0c;
9
Ten static int __init hash_size_init (void)
11 {
---printk ("Pidhash_shift----------------->%u\n", *p_shift);
---return 0;
14}
15
static void __exit hash_size_exit (void)
17 {
PRINTK ("<1>exit---------------------! \ n");
19}
20
Module_init (Hash_size_init);
Module_exit (Hash_size_exit);
Module_license ("GPL");

--------------------------------------------------------------------------------------------
2, the establishment of the hash function.
Hash lookup is definitely inseparable from the hash function, in fact, the hash function is a mathematical function, used to distribute the entity in
In the hash table, the first few collisions occur.

#define PID_HASHFN (NR, NS)-\
---hash_long ((unsigned long) NR (unsigned long) NS, Pidhash_shift)

#define Hash_long (Val, bits) hash_32 (val, bits)

19/* 2^31 2^29-2^25 2^22-2^19-2^16 1 */
20 #define Golden_ratio_p Rime_32 0x9e370001ul
57 static inline u32 hash_32 (u32 val, unsigned int BITS)
58 {
59 ---/* on some CPUs multiply is faster, on others gcc will do shifts */
60 ---u32 hash = val * golden_ratio_prime_32;
61
62 ---/* high bits is more random, so use them. */
63 ---return hash >> (32 - bits);
64 }

As a result, the kernel's index of distributing processes to a hash table is determined by both the process number and the namespace, using the
Mathematical functions are also more complex. I think that is also to reduce the occurrence of conflict.
--------------------------------------------------------------------------------------------
3, how to avoid hash collisions.
Hashes in the kernel are generally used to resolve conflicts using the address method, which is feasible in application, using
A doubly linked list links conflicting nodes together.

-----------------------------------------------------------------------------------------
4, inserts the "index" of the process into the hash table during process creation.
The process inserts a struct PID into the hash table in the Alloc_pid function.
The calling procedure is shown in:

----------------------------------------------------------------------------------------------
Code in Alloc_pid:

SPIN_LOCK_IRQ (&pidmap_lock);
for (; Upid >= pid->numbers;--upid)
Hlist_add_head_rcu (&upid->pid_chain,
&PID_HASH[PID_HASHFN (UPID->NR, Upid->ns)]);
SPIN_UNLOCK_IRQ (&pidmap_lock);

Process will create a good struct PID in the NR and NS as the input of the hash function, thus calculating the subscript of the hash array,
Then use the head interpolation method to insert the struct Hlist_node *pid_chain node into the list.
----------------------------------------------------------------------------------------------
5, using hash lookup to quickly locate the struct PID
In the kernel function, there is a find_vpid function that can quickly find the process's struct PID through the PID of the process.
This is done by locating the hash table in which it was created.

struct PID *find_pid_ns (int nr, struct pid_namespace *ns)
{
struct Hlist_node *elem;
struct Upid *pnr;
HLIST_FOR_EACH_ENTRY_RCU (PNR, Elem,
&PID_HASH[PID_HASHFN (NR, NS)], Pid_chain)
if (Pnr->nr = = Nr && Pnr->ns = = ns)
Return container_of (PNR, struct PID,
Numbers[ns->level]);
return NULL;
}

Find_vpid is directly called Find_pid_ns, the function of the two parameters, one is the process PID, one is the namespace NS
With these two, you can use the hash function to navigate to the subscript index of the hash array, and then traverse the list of the array cells to find
The struct PID of the process.

Linux kernel hash lookup (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More