Hashing is used in big data lookups

Source: Internet
Author: User

Use hashing to implement a simple query of 6,428,633 CSDN account data

#define _crt_secure_no_warnings#include <stdio.h> #include <stdlib.h>char path[256] = "e:\\big_data\\ Csdn.txt "; #define N 6428633unsigned int Bkdrhash (char *str); struct Beitai{char *pstr;//store string struct Beitai *pnext;//Next Node};struct info{struct beitai *PBT;}; struct info *pall = null;//Insert struct Beitai *addstr (struct Beitai *phead, char *str) {struct Beitai *pnew = calloc (1, sizeof (struct beitai));//Open node int length = strlen (str);p new->pstr = calloc (length + 1, sizeof (char)); strcpy (pnew-> PSTR, str);//Copy Pnew->pnext = Null;if (phead==null) {phead = pnew;} else{pnew->pnext = Phead;phead = pnew;} return phead;} Implementation modification, query void find (struct Beitai *phead, char *findstr) {while (phead!=null) {char*ps = Strstr (Phead->pstr, findstr); I F (ps!=null) {printf ("%s", phead->pstr);//Lookup}phead = Phead->pnext;}} void Changestr (char *str) {char *pbak = str;//Backup address//remove ' character int i = 0;int j = 0;while ((str[i] = str[j++])! = ' + ') {if (str[i ] = ") {i++;}} Truncate char *P1 = STRSTR (Pbak, "#"); if (P1! = NULL) {*p1 = '} ';}} void Init () {pall = malloc (n*sizeof (struct info)), memset (Pall, 0, n*sizeof (struct info));//empty file *PF = fopen (path, "R"); f or (int i = 0; i < N; i++) {char str[100] = {0};char strbak[100] = {0};//backup fgets (str, +, PF);//Read strcpy (Strbak, St R);//copy Changestr (str);//string processing unsigned int data = Bkdrhash (str); unsigned int id = data%N;PALL[ID].PBT = ADDSTR (pall[id].p BT, Strbak);//Find the list node, insert}fclose (PF);} unsigned int bkdrhash (char *str) {unsigned int seed = 13131313;//131 1313 13131 131313 etc.. unsigned int hash = 0;while (*str) {hash = hash * seed + (*str++);} Return (hash & 0x7FFFFFFF);} int Getn () {FILE *PF = fopen (path, "R"), if (PF = = NULL) {return-1;} Else{int i = 0;while (!feof (PF)) {char str[100] = {0};fgets (str, +, PF);//read i++;} Fclose (PF); return i;}} The implementation query has conflicting (same) data void Main () {printf ("This data altogether has:%d rows \ n", Getn ()); Init (); while (1) {char str[100] = {0};scanf ("%s", str); unsigned int id = bkdrhash (str)% n;find (PALL[ID].PBT, str);} System ("Pause");}


Copyright NOTICE: This article is for bloggers original article, welcome to point out the code is bad, put forward the Code optimization scheme. Welcome guidance, Night code, desperately updated, hard struggle ...

Hashing is used in big data lookups

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.