Recently want to use a hash function to generate a unique value based on a string, but did not find a good hash function, like PHP built-in MD5, SHA and other hash functions generated by the value is too long, I can not use, want to intercept, and worry about a hash collision, special to pray for high people with what hash function
Reply content:
Recently want to use a hash function to generate a unique value based on a string, but did not find a good hash function, like PHP built-in MD5, SHA and other hash functions generated by the value is too long, I can not use, want to intercept, and worry about a hash collision, special to pray for high people with what hash function
Although the use of what hashing algorithm depends on the data, but even MD5 too long, it is really a bit difficult. I really want to hear why. "The value generated by the hash function is too long for me to use ." If there is no assessment, it's just a feeling to say it, that's really too much.
What you might need is not a low-impact hashing algorithm, but a string compression algorithm that compresses the output of the hashing algorithm. Because the hash algorithm output the character set is only 16, and the ASCII display character buckle off the space also left 94, so the single compression string length, the idea should be able to satisfy you.
The shorter the output (i.e., the smaller the value of the hash) inevitably increases the probability of the hash collision and does not have any unrealistic illusions. In other words, no matter what your algorithm is, the probability of a hash collision increases as long as the range equals the reduction. So if you really need a short output of the hash function, you do not have to try to find out, interception of this method is effective enough.
The title is not good either. The occurrence of a hash collision is inevitable , and the "solve" hash collision itself does not exist.
Although the hash collision is a small probability event, it must not be feared, but can not be avoided, especially as a "non-existent". Must be based on the needs of the application, there is a clear way to treat it. My advice is either to add a value space to the hash algorithm, or to add other comparison features as an extra complement to the hashing algorithm.
The longer the length, the less likely it is to collide. Reducing the length inevitably increases the chance of collision. Because you are insinuate the original space into a string of Hashishen space, the length of the string determines the size of the space.