A common hash algorithm

Source: Internet
Author: User
Tags function definition mixed
Introduction to Hash method hash functions and the number of bits in various forms of hashing common hash function versions hash code download Introduction
Dilute functions by definition you can implement a pseudo-random number generator (PRNG), From this perspective, it can be concluded that the performance comparisons between hash functions can be measured by comparing their comparison to pseudo random generation. Some commonly used analytical techniques, such as Poisson distributions, can be used to analyze the collision rates of different hash functions on different data (collision rate). In general, there is a theoretically perfect hash function for any type of data. This perfect hash function definition is no collision, which means that no duplicate hash value occurs. In reality it is difficult to find a perfect hash function, and the function of the asymptotic variant of this perfect function is quite limited in practical applications. In practice, it is generally recognized that a hash function of a perfect hash function is a function of the least hash of collisions generated on a particular dataset. The problem now is that there are various types of data, some are highly random, and some have high latitude graphics structures that make it very difficult to find a generic hash function, even if it is a particular type of data, it is not easy to find a better hash function. All we can do is to find the hash function that satisfies our requirements by trying the wrong method. You can select a hash function from the following two angles: 1. Data distribution   A measure is to consider whether a hash function can distribute the hash value of a set of data well. To do this analysis, you need to know the number of hash value of the collision, if you use a linked list to handle collisions, you can analyze the average length of the linked list, you can also analyze the number of hash values grouped. 2. The efficiency of a hash function is another measure of the efficiency of the hash function being the hash value. Usually, the algorithm complexity of an algorithm that contains a hash function is assumed to be O (1), this is why the time complexity of searching for data in a hash table is considered to be "average O (1) complexity", whereas in other commonly used data structures, such as graphs (usually implemented as red-black trees), they are considered O (LOGN) of complexity. A good hash function is compulsory in theory very fast, stable and is verifiable. Usually the hash function cannot achieve the complexity of O (1), but hash functions are indeed very fast in the linear search for string hashes, and usually the object of a hash function is a smaller primary key identifier, so the entire process should be very fast and, to some extent, stable. The hash function described in this article is called a simple hash function. They are typically used for hashing (hash string) data. They are used to produce a key that is used in an associative container such as a hash table. These hash functions are not password-safe and can easily produce exactly the same hash value by inverting and combining different data.
Hashing Methodology
Hash functions are usually defined by the method by which they produce a hash value. There are two main methods: 1. Hashing based on addition and multiplication this is done by traversing the elements in the data and then adding a value to the initial value, which is associated with an element of the data. This is usually multiplied by a prime number for the calculation of an element's value.

2. Displacement based hashing

Like an additive hash, a shifted hash takes advantage of each element of the string data, but unlike addition, the latter is more of a bit-shifting operation. It is usually a combination of left and right shifts, and the number of digits that are moved is also a prime. The result of each shift process is only adding some cumulative calculations, the final displacement of the result as the final result.


hash functions and primes

No one can prove the relationship between prime numbers and pseudorandom number generators, but for now the best results are using primes. Pseudo-random number generator is now a statistical thing, not a definite entity, so the analysis of it can only have some understanding of the whole result, and do not know how these results are produced. If more specific research can be carried out, perhaps we can better understand which values are more effective, why prime is more effective than others, why some primes do not, if we can use reproducible proof to answer these questions, then we can design a better pseudo-random number generator, may also get a better hash function.

The basic concept surrounding the use of primes in a hash function is to use a quality to change the state values of the processed hash functions, rather than using other types of numbers. The meaning of dealing with this word is to do some simple operations on the hash value, such as multiplication and addition. The resulting new hash value must have a higher entropy in statistics, that is to say, there is no bias. Simply put, when you use a prime to multiply a bunch of random numbers, the probability that the number is 1 at the bit level should be close to 0.5. There is no specific proof that this inconvenience to the phenomenon only occurs in the use of prime numbers, this seems to be a self-fulfilling theory of intuition, and some people in the industry to follow.

The best combination of deciding what is right and even better methods and using the hash primes is still a very black art. There is no single way to claim that you are the ultimate universal hash function. One of the best things to do is to make a trial and error evolution and get the proper hashing algorithm to meet the statistical analysis method that it needs.


bit bias

Bit sequence generators are purely random or, to some degree deterministic, can produce a certain state or a bit of the opposite state in a certain probability, which is a bit biased. In a purely random case, a position of high or low level should be 50%.

Then in the pseudo random generator, the algorithm will determine the bit bias in the generator at the minimum output module.


Suppose a PRNG produces 8 bits as its output block. For some reason, the MSB is always set to high, and the MSB's bit bias will be 100% higher. The conclusion is that even if there are 256 possible values of this prng, a value less than 128 will never be produced. For simplicity's sake, assuming that other bits are being generated purely random, then there is an equal chance that any value between 128 and 255 will occur, but at the same time there is a 0% chance that a value less than 128 will occur.

All prngs, whether it is a hash function, a password, a msequences, or any other generator that generates a bit stream, will have such a bit bias. Most prngs they will try to converge to a certain value, the stream cipher is an example, and the other generator is better at an indeterminate bit bias.

Mixing or bit-sequence scrambling is a method that arises in a common, equal flow-position bias. Although we have to be careful to ensure that they do not mix to divergent position bias. A mixed form of cryptography is called an avalanche, which is a bit of a block that uses another block to replace or displace the mixture, while the other is generated with other fast-mixed output.

As shown in the following illustration, an avalanche process begins with one or more binary blocks of data. The first slice data produced by certain bit operations in the data (usually some input sensitive bits to reduce bit logic). This process is then repeated at Tier I data to generate a i+1 layer of data that is the number of digits in the current layer that are less than or equal to the front layer.

This iterative process will result in a bit of all bits of data that depend on before. It should be noted that the following figure is a simple generalization, and the avalanche process is not necessarily the only form of the process.

Various forms of hashing

A hash is a tool that maps data to an identifier in the real world, and the following are some common areas of hash functions:

1. String hash

In the area of data storage, data is primarily indexed and structured support for containers, such as hash tables.

2. Encrypted hash

For data/user verification and validation. A powerful cryptographic hash function makes it difficult to get the original data from the result. The cryptographic hash function is used to hash the user's password, which in place of the password itself is a server that is hard to read. The cryptographic hash function is also considered as an irreversible compression function that can represent a large amount of data identified by a signal, and it is useful to judge whether the current data has been tampered with (such as MD5) or as a data marker to prove the authenticity of encrypting the file by other means.

3. Geometric hashing

This hash table is used in the field of computer vision for the detection of objects in any scene. The process initially selected involves a region or an object of interest. Used from there, such as the Harris Angle Detector (HCD), the scale Invariant feature transformation (SIFT) or the express powerful function (SURF), a set of functions of affine extraction is considered to represent an affine invariant feature detection algorithm representing an object or region. This set is sometimes called a macro function or function of the constellation. The nature of the found function and the type of object or area listed in it may still be possible to match the characteristics of the two constellations, even if there may be slight differences (such as missing or abnormal features) of two episodes. Constellation, and then said to be functional classification settings.
The hash value is computed from the constellation's characteristics. This is usually done by initially defining a place where the hash value is to be completed in the living space-in which case the hash value is a multidimensional value that defines the space normalization. Plus the computed hash value of another process, determining the distance between the two hashes is a necessary process-a distance measurement is required, rather than a deterministic equality operator as a result of the calculation of the possible gap between the constellation's hash value. Also because the simplicity of the Euclidean distance metric is inherently ineffective, the result is that the automatic determination of the distance metric for a particular space has become an active area of academic research in dealing with the non-linear nature of such spaces.
The geometrical hash includes the purpose of any scene in the re-detection of various vehicle classifications, typical examples. The level of detection can vary from just testing whether it is a vehicle to a specific model of a vehicle in a particular vehicle. 4. Cloth Lung Filter
The Prum filter allows a very large range of values to be represented by a small number of memory locks. In computer science, it is well known that the associated query, and the core concept of the associated container. Bloom filter is implemented through a variety of hash functions, but also by allowing a specific value of the existence of a certain error probability member query results. The Prum filter guarantees that queries against any Member State will never have false negatives, but may be false positives. The probability of false positives can be changed by controlling the size of the table used by the Prum filter and by the number of different hash functions.
Subsequent research focused on the hashing function and the hash table as well as the Mitzenmacher filter and other fields. It is suggested that the most practical use of this structure in data hashing entropy is helpful to hash function entropy, This is the theoretical result of the conclusion of a best of the cloth-lung filters (a given the size of a table with a minimum further cause of false positives or vice versa) to provide a false positive probability definition that the user can build up also as two distinct 22 independent hash function known functions, greatly improved the query efficiency of the members.
Prum filters typically exist in applications such as spell checker, string matching algorithms, network packet analysis tools, and network/Internet caching.
Common hash functions
The common hash function library has the following string hashing algorithms that mix addition and one operation. The following algorithms differ in their usage and functionality, but they can all be examples of learning how hashing algorithms are implemented. (Other version code implementation see download)
1.RS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.