Import Java. io. unsupportedencodingexception; <br/> Import Java. security. messagedigest; <br/> Import Java. security. nosuchalgorithmexception; <br/> Import Java. util. random; <br/> public class hashcomputer {<br/> Private Static messagedigest MD5; <br/> static {<br/> try {<br/> MD5 = messagedigest. getinstance ("MD5"); <br/>} catch (nosuchalgorithmexception e) {<br/> // todo auto-generated Catch Block <br/> E. Printstacktrace (); <br/>}< br/>/** <br/> * Get the MD5 of the given key. <br/> */<br/> Public static byte [] computemd5 (string K) {<br/> md5.reset (); <br/> byte [] keybytes = NULL; <br/> try {<br/> keybytes = K. getbytes ("UTF-8"); <br/>} catch (unsupportedencodingexception e) {<br/> throw new runtimeexception ("unknown string:" + k, e ); <br/>}< br/> md5.update (keybytes); <br/> return md5.digest (); <Br/>}< br/> Public static int Hash (string Str) {<br/> byte [] BB = computemd5 (STR ); <br/> long Rv = (long) bb [3] & 0xff) <24) <br/> | (long) bb [2] & 0xff) <16) <br/> | (long) bb [1] & 0xff) <8) <br/> | (long) bb [0] & 0xff) <0); <br/> return (INT) (RV & 0x7fffffffl ); /* truncate to 32-bits */<br/>}< br/> Public static int hash2 (string Str) <br/>{< br/> int hash = 0; <br/> int x = 0; </P> <P> for (INT I = 0; I <Str. length (); I ++) <br/>{< br/> hash = (hash <4) + Str. charat (I); <br/> If (x = (INT) (hash & 0xf0000000l ))! = 0) <br/>{< br/> hash ^ = (x> 24); <br/> hash & = ~ X; <br/>}< br/> return (hash & 0x7fffffff ); <br/>}</P> <p> Public static void main (string [] ARGs) {<br/> int n = 55; <br/> int AA [] = new int [N]; <br/> for (INT I = 0; I <n; I ++) <br/> AA [I] = 0; </P> <p> random ran = new random (); <br/> for (INT I = 0; I <100000; I ++) {<br/> int length = ran. nextint (30); <br/> stringbuffer sb = new stringbuffer (length); <br/> for (INT n = 0; n <length; n ++) {<br/> Sb. append (char) (ran. nextint (95) + 32); <br/>}< br/> int L = hashcomputer. hash (sb. tostring (); <br/> AA [l % N] ++; <br/>}</P> <p> for (INT I = 0; I <n; I ++) <br/> system. out. println (AA [I]); <br/>}< br/>
When the modulo is reached, there will be a very uneven, depressing
7422
7365
7483
7319
7610
7280
7654
10627
7495
7473
7625
7341
7306
The reason is that the length of the string to be tested is randomly generated: int length = ran. nextint (30); nextint may generate 0. The hash results of these strings with the length of 0 are the same, and they are all calculated in one bucket, as a result, the string of this bucket is significantly greater than that of other buckets. This has nothing to do with the hash function used, but with the tested data.
The solution is
Int length = ran. nextint (30 );
Change
Int length = ran. nextint (20) + 10;