PHP associated array and hash table are not specified

Source: Internet
Author: User

A Data Type in PHP is very important. It is an associated array, also known as a hash table. It is a very useful data structure.

In the program, we may encounter the problem of removing duplicates. The simplest model is as follows:

There is a list of user names, storing 10000 user names without repeated items;
There is also a blacklist list that stores 2000 user names in the same format as the user name list;
Now, you need to delete the user name in the blacklist from the user name list, which must be processed as quickly as possible.

This problem is a small-scale processing capacity. If it is practical, two tables may be very large, for example, there are 0.2 billion records.

The method I first came up with was to create a nested loop, with M records in the User table and N records in the blacklist list. Then, the number of cycles is M * N!
PHP code:


01 <? Php 
02 foreach ($ arrayM as $ keyM => $ nameM ){ 
03 foreach ($ arrayN as $ nameN ){ 
04 if ($ nameM = $ nameN ){ 
05 // We performed M * N times! 
06 unset ($ arrayM [$ keyM]); 
07} 
08} 
09} 
10 return $ arrayM; 
11?>
Another method is to use array indexes.

PHP is a weak type language and does not have strict variable type restrictions as C does. C language array, each element type must be consistent, and the index starts from 0.
PHP arrays, which can be indexed by strings, are also called associated arrays.
Array indexes have a natural limit that they do not repeat and do not need to be searched during access. They can be directly located.

Or the problem we just mentioned, we adopt another method.

Organize the User Name of the blacklist list into an array. The index of the array is the user name.

Then, when traversing the user list, you only need to use isset to query whether the user name exists.

PHP code:


01 <? Php 
02 $ arrayHash = array (); 
03 foreach ($ arrayN as $ nameN ){ 
04 // We performed N times. 
05 $ arrayHash [$ nameN] = 1; 
06} 
07 
08 foreach ($ arrayM as $ keyM => $ nameM ){ 
09 if (isset ($ arrayHash [$ nameM]) { 
10 // We performed M times! 
11 unset ($ arrayM [$ keyM]); 
12} 
13} 
14 return $ arrayM; 
15?>
We can see that the number of Optimized Code loops is M + N.

If both M and N are 10000, the cycle is 0.1 billion times before optimization. After optimization, only 20000 times are cyclically reduced, which is 5000 times worse!
If the second program takes 1 second, the first program takes nearly one and a half hours!

========================================================== ======================================
Hash is a complicated thing. It is not so exaggerated to understand it. Take a note here.

Hash: Translation of Chinese characters into messy things. Some people call it "hash ".

To put it simply, hash is used to convert a complex string to obtain a simple number (usually a number ).
For example, "abcd" is directly added with the values of each character, and then the remainder of 10 is obtained (a + B + c + d) to obtain a number. For example, the result is 5, then this 5 will represent the abcd string in a certain sense. In other words, this 5 is also a mark of the string, and it is a simplified Mark, so someone calls this 5 as the string's abstract or fingerprint.
This 5 can be used as the subscript of an array. For example, I construct a pointer array void * hash_array [10], then I can fill in a pointer at the position 5, for example, pointing to the abcd string.
In this case, if I want to query whether a string exists, I do not need to use the string loop to compare the slow operation on an array, and directly obtain the hash value of a string first, use this hash value to search for it directly in the array subscript, which is much faster, especially when there is a large amount of data.

We can see that when the hash value is calculated above, the result may not start from 0, for example, 5. That is to say, this 5 is an uncertain position in the array, or it can be called a position that is merged. Other locations may remain empty. This is why the array or table is called a hash table.

But there is a problem. The above conversion method is directly added, and then an remainder is obtained. When the string is changed to abdc, the result is still number 5. This is a problem with the above algorithm, that is, it cannot guarantee a uniqueness. Therefore, many hash algorithms, such as MD4, MD5, and SHA-1, have been studied to ensure uniqueness.
However, this algorithm can still be used. After abdc returns the hash value of 5, check whether 5 is occupied. If so, add the number to 1, it is 6. If 6 is not used, fill in the value. If the value of a subsequent string is 6, but 6 is occupied, add 1 to it and save it again.
When retrieving data, you can first calculate the hash value and then check whether the content in it is what you want. If not, add 1 to check and finally get one.

So here, the content of the hash table is not organized as an array at the very beginning, but gradually increased in the future.
The content stored in the hash table can generally be a pointer, which can point to a large structure. This structure can contain key and value information.
The hash table can also be an array. You can organize it into a linked list. The node Structure in the linked list can contain a parameter hash_value of the number for quick search.

Although hash is often used for encryption and other occasions, it can also be used in common application code to store simple data, which will improve the code efficiency.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.