One implementation of querying massive hash data: weekend training

Source: Internet
Author: User

I am so happy that I will accompany my wife at home on weekends and cook at noon. Before cooking, I will deepen my understanding of hash and write down this article.

Body: This article mainly simulates the massive search process. Obtain a large amount of data information from the TXT file (simulate a large amount of data), create a hash table, and enter keywords (strings) to quickly locate the value to be searched. It is actually to search for a specific string. The code and verification diagram are displayed. The whole process simulates the massive search process and can truly feel the advantages of hash Massive processing.

First paste a few running:

Figure 1

A slight explanation: combined with the first line in Figure 1, 359th times, it refers to the number of mappings in sequence; 992 points, it refers to the position in the hash array, that is, hashtable [992]; the stored string stores the original string after hash ing for comparison. Value refers to the value to be located after the ing. For example, we want to find the probability of a string, or the number of times the string appears, etc., in the program is equivalent to leaving an excuse. Note: All data is read from the TXT file, and the TXT file is generated by simulation by the program. The rand () function is used. For details, refer to the program source code or download the project files posted later. I have generated 5000 macro parameters. By modifying macro parameters, we can add more, which is very close to reality.

Figure 2

Figure 2 the last line shows the position of the string to be searched. For example, the blue part.

 

I first searched the internet and found that either a bunch of theories or key code was added, which made me unable to understand the meaning of a massive volume.

This article is divided into three parts. Part 1: Basic knowledge of hash, Part 2: Hash ing establishment and conflict resolution, specific code implementation

Hash is mainly used for fast search. Its time complexity is O (1), which is the best search algorithm. For the entry level of hash, refer to my other blog Data Structure-Exercise 2 hash table. The most basic understanding of hash is: According to the ASCII (transformed) of the string to be searched, the hash function is used to convert it to the subscript of its storage location. After the hash table is created, we can quickly locate the string based on the same hash function ing method, and then find the relevant information.
The construction of hash functions has become one of the most important tasks. Common methods include direct address retrieval, square retrieval, Division of remainder, numerical analysis, folding, and random number calculation.

The example in this article uses the division of remainder:

The basic principle is: H (key) = Key % mod. mod is a number smaller than the table length. In this example, 1024 is used.

Combined with code specific analysis (sdbm method ):

int SDBMHash(char* str)    {        int hash = 0;        while(*str!='\0')       { 
// Equivalent to: 65599 * hash + * STR ++. This is the ing function. The hash = * STR ++ (hash <6) is formed by the asciil of each character in the string) + (hash <16)-Hash;} return (hash & 0x7fffffff );}

Finally, do not forget to calculate the remainder: Key = Key % hashtablelen; // hashtablelen = 1024. Key is the return value of sdbmhash.

Conflict resolution is the second important process. The basic methods include the linked list method, the open address method, and the hash method to open up public overflow zones.

This article uses the common linked list method:

On the left is the location of the key obtained for the first time. Obviously, two different strings may have the same key. At this time, we will add a chain table to store it.

The Code is as follows:

 p=&HashT[key];while(p->next!=NULL) p=p->next;{p->next=new hashNode();p->next->used=true;strcpy(p->next->key,str);p->next->value=rand()%100;
        }

Paste the code of the complete program:

# Include <iostream> # include <fstream> # include <string> using namespace STD; const unsigned int num = 5000; const int hashtablelen = 1024; int COUNT = 0; // records the number of times hash ing occurs. Verify the role of struct hashnode {bool used; hashnode * Next; char key [28]; unsigned int value; hashnode () {used = false; next = NULL;} hashnode (char * Key, unsigned int value) {strcpy (key, key); value = value ;}} hasht [hashtablelen]; // The length of the hash table is 1024, so 1024 is used for touch; unsigned int Elfhash (char * s); // declare the functionvoid createstrtxt (); void establishhat (); void findstr (char *); int main () {// For (INT I = 0; I <50000; ++ I) // generates a TXT file and randomly writes a string to it. // Createstrtxt (); establishhat (); findstr ("QD 'e _ usigh] oscfwsnfreshayug"); // findstr ();/* char STR [30]; ifstream ifs ("str.txt"); ifs> STR; */} int sdbmhash (char * Str) {int hash = 0; while (* Str! = '\ 0') {hash = * STR ++ (hash <6) + (hash <16)-Hash;} return (hash & 0x7fffffff );} unsigned int elfhash (char * s) {int hash = 0, x = 0; while (* s) {hash = (hash <4) + (* s ++ ); if (x = hash) & 0xf0000000l )! = 0) {hash ^ = (x> 24); hash & = ~ X ;}} return hash & 0x7fffffff;} void createstrtxt () {for (INT I = 0; I <num; ++ I) {char temp [30] = {'\ n',' \ R', Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, R And () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, Rand () % 31 + 91, '\ 0 '}; char * STR = temp; ofstream ofs ("str.txt", IOS: APP); OFS <STR ;}} void establishhat () // use the linked list method to construct the hash table {hashnode * P; int key; char STR [28]; ifstream ifs ("str.txt"); While (IFS> Str) {key = sdbmhash (STR); Key = Key % hashtablelen; If (hasht [Key]. used = false) {strcpy (hasht [Key]. key, STR); hasht [Key]. used = true; hasht [Key]. value = rand () % 100; count ++; // objective of pseudo-random It is the value in the simulation key-value, which is the value we are looking for, e.g. if you want to find whether the string is stored //, printf (" ing occurred at % d for the second time at % d. The saved string is: % S, and the value is: % d \ n ", Count, key, STR, hasht [Key]. value); // contains the village string itself; if you want to check the probability of a string, // contains the probability value. If you are looking for the number of occurrences of the string, the number of times is stored in it. Just like Baidu's top 10 queries, the principle is the same.} else // conflict processing {P = & hasht [Key]; while (p-> next! = NULL) P = p-> next; {P-> next = new hashnode (); P-> next-> used = true; strcpy (p-> next-> key, STR); P-> next-> value = rand () % 100; count ++; printf (" ing occurred at % d For times % d. The saved string is % S, and the value is % d \ n", Count, key, STR, hasht [Key]. value) ;}}} void findstr (char * Str) {hashnode * P; int key; Key = sdbmhash (STR); Key = Key % hashtablelen; if (hasht [Key]. used = false) {printf ("this string does not exist");} else {P = & hasht [Key]; while (P! = NULL) {If (! Strcmp (p-> key, STR) {printf ("Destination string found at LOCATION % d: % s", key, STR); break ;} else {P = p-> next;} If (P = NULL) printf ("Sorry! The string does not match ");}}}
                          
Note: The findstr (char *) function is mainly used to find specific strings. The principle is the same as the process of constructing a hash function. Note: The character strings written into the function parameters should not contain escape characters. The program may not recognize the escape characters. Therefore, select a parameter without a transfer symbol in the TXT file, in this example.
Createstrtxt (); mainly refers to the creation of TXT files. Num is the number of created strings. Note that when I create a file, I use IOs: app, so every time I run the program, if both are createstrtxt (), the number of character strings in the TXT will increase by 5000 times each time. Therefore, you can create the file once and log out later. Project File: Download it here. I need a credit. Sorry, I want to save something and leave a mailbox without credit. I will send the speed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.