PHP associative array with hash table (hash table) not specified

Source: Internet
Author: User

PHP has a data type is very important, it is an associative array, also known as hash table (hash table), is a very useful data structure.

in the program, we may encounter the need to eliminate the weight of the problem, to give the simplest model:

There is a list of user names, stored 10,000 user names, no duplicates;
There is also a blacklist list, stored 2000 user names, the format is the same as the user name list;
Now you need to remove the username from the list of users in the blacklist and ask for the fastest possible time to process it.

This problem is a small amount of processing, if the actual point, 2 tables can be very large, for example, there are 200 million records.

I first thought of the method is to do a nested loop, set the user name table has m Records, blacklist list has N records, then the number of cycles is M * N times!
PHP Version Code:


<?php
($arrayM as $keyM = $nameM) {
foreach ($arrayN as $nameN) {
if ($nameM = = $nameN) {
05//The bank executes M * N times!
Unset ($arrayM [$keyM]);
%}
(On }
-- -}
$arrayM of the Ten return;
?>
another way to take advantage of an array index.

PHP is a weak type of language,Unlike the C language, there are strict variable type restrictions. An array of C languages, each element must be of the same type, and the index starts at 0.
an array of PHP, which can be indexed as a string, also known as an associative array.
array index, there is a natural limit is not repeated, and access to the time do not need to find, can be directly located.

or the question just now, we are taking another approach.

The username of the blacklist list is organized into an array, and the index of the array is the user name.

then, when traversing the list of users, simply use Isset to query for the existence of that user name.

PHP Version Code:


<?php
$arrayHash = Array ();
foreach ($arrayN as $nameN) {
04//The Bank executed N times.
$arrayHash [$nameN] = 1;
}
-
($arrayM as $keyM = $nameM) {
if (Isset ($arrayHash [$nameM])) {
10//The bank executed M times!
Unset ($arrayM [$keyM]);
-- }
-}
return $arrayM;
?>
you can see that the optimized code, the number of cycles is M + N times.

if M and N are all 10000, before optimization, the cycle is 100 million times, after optimization, only 20,000 cycles, 5,000 times times worse!
If the second program takes 1 seconds, the first program will take nearly 1.5 hours!

=========================================================================
hash A seemingly complex thing, in fact, it is not so exaggerated to understand, take a note here.

Hash, Chinese translation into a messy thing, someone also called it hash, or translated into what is not the transliteration of the "hash".

simply put, a hash is a simple number (usually a number) that is used to convert a complex string into a certain conversion.
such as "ABCD" with the value of each character directly added, and then take the remainder of 10, both (a+b+c+d), to get a number, for example, the result is 5, then this 5 can represent the string ABCD in a certain sense. Or the 5 can be said to be a marked thing of this string, and it is a simplified mark, so someone called the 5 as a string digest, or fingerprint.
The good thing about this 5 is that it can be used as a subscript for an array, as I construct an array of pointers void* hash_array[10], so I can fill the 5 position with a pointer, such as a point to the ABCD string.
in this case, if I want to query whether a string exists, there is no need for an array of strings to be used to compare such a slow operation, and directly first get the hash value of a string, and then use this hash value, in the array subscriptdirectly to find, this speed is much faster, especially when the data is more.

you can see that the above calculation of the hash value, the result, may not start from 0, as we calculated is 5. In other words, this 5 is an indeterminate position in the array, or it can be called a hash. Other locations may have been empty at. This is why this array or table is called a hash table.

But there is a problem, the above conversion method, directly add, and then take a remainder, when the string becomes ABDC, the result is still the number 5. This is a problem with the algorithm above, that is, it cannot guarantee a uniqueness. So there are a lot of hash algorithm research, such as md4,md5,sha-1, to ensure uniqueness.
but the above algorithm can still be used, the practice is to get 5 after the ABDC hash, to check whether 5 is occupied, if occupied, then the number plus 1, that is 6, if 6 is not occupied, fill in the value. If the following string calculates a value of 6, but 6 is already occupied, then add 1, then save.
when you take the data, you can calculate the hash value, then see if the content is what you want, if not, add 1 to see, and finally get one.

so the contents of this hash table are not organized as usual arrays, but are slowly increasing in the future.
The contents of the hash table can generally be a pointer, thisthe pointer can point to a large structure and is also possible. This structure can have key, value information.
hAsh table can also be not an array, you can organize it into a list, the node structure in the list can have a parameter is that number of hash_value, used to quickly find.

Although hash is used in many cases such as encryption, in general application code, it can also be used to store simple data, so that the efficiency of the code is much higher.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.