The Java learning Hash

Source: Internet
Author: User


Hash, the general translation to do "hash", there is a direct transliteration of "hash", is the arbitrary length of the input (also known as pre-mapping, pre-image), through the hash algorithm, transformed into a fixed-length output, the output is the hash value. This conversion is a compression mapping, that is, the space of the hash value is usually much smaller than the input space, different inputs may be hashed to the same output, so it is not possible to determine the input value from the hash value. Simply, a function that compresses messages of any length to a message digest of a fixed length. The basic concept of hash function (computer algorithm domain) * If there is a record in the structure that is equal to the keyword K, it must be in the storage location of f (k). As a result, the records can be obtained directly without comparison. The corresponding relationship f is called the hash function, which is a hash table based on the pre-established tables. * The same hash address may be obtained for different keywords, i.e. Key1≠key2, and F (key1) =f (Key2), this phenomenon is called Collision。 A keyword with the same function value is called by the hash function Synonyms。 In summary, a set of keywords is mapped to a finite contiguous set of addresses (intervals) based on the hash function H (key) and the method of handling conflicts, and the "image" of the keyword in the address set as the storage location for the record in the table, which is called the hash list, which is called the image process. Hash watchmakingOr Hash Column, the resulting storage location is called Hash Address。 * If the probability of a hash function image to any address in the address set is equal for any of the keywords in the keyword set, the hash function is called Uniform hash Function(Uniform hash function), which is to get a "random address" of the keyword after hashing, thus reducing the conflict.   Properties All hash functions have the following basic characteristics: If the two hash values are not the same (according to the same function), then the original input of the two hash values is not the same. This feature is a deterministic result of a hash function. On the other hand, the hash function input and output is not one by one corresponding, if the two hash values are the same, two input values are likely to be the same, but it is not absolutely certain that the two must be equal. Enter some data to calculate the hash value, then partially change the input value, a hash function with strong obfuscation will produce a completely different hash value. A typical hash function has an infinitely defined field, such as a byte string of any length, and a finite range of domains, such as a fixed-length bit string. In some cases, a hash function can be designed to have a one by one correspondence between a defined field and a domain of the same size. One by one the corresponding hash function is also known as an array. Reversibility can be obtained by using a series of reversible "blending" operations for input values.   Common hash function • Direct Fetch method: F (x): = x mod maxm; MAXM is generally not very close to the 2^t of a prime number. • Multiply rounding: F (x): =trunc ((x/maxx) *maxlongit) mod MAXM, mainly used for real numbers. • Square Take method: F (x): = (x*x div. mod 1000000); Square after the middle, each contains more information.   Construction method Hash functions can make access to a data series more efficient, and the data elements will be positioned more quickly through the hashing function. (The detailed construction method can refer to the hash function in the "method of constructing a hash table") 1. Direct addressing: The value of a linear function that takes a keyword or keyword is a hash address. That is, H (key) =key or H (key) = a key + B, where A and B are constants (this hash function is called its own function) 2. Digital Analysis Method 3. The square takes the middle method 4. Folding Method 5. Random number Method 6. In addition to the remainder of the method: Take the keyword is not greater than the hash table length m of the number of p after the remainder is a hash address. That is, H (key) = key MOD p,p<=m. Not only can the keyword directly modulo, but also in the collapse, the square to take the medium operation after the modulo. The choice of P is very important, generally take prime or m, if p is not good, easy to produce synonyms.   Dealing with conflict Method 1. Open addressing method; Hi= (H (key) + di) MOD m,i=1,2,..., K (k<=m-1), where H (key) is a hash function, M is a hash table, and di is an incremental sequence, the following three methods can be used: 1). Di=1,2,3,..., m-1, called linear detection re-hash; 2). Di=1^2, ( -1) ^2,2^2, ( -2) ^2, (3) ^2,..., ± (k) ^2, (K&LT;=M/2) called two probes and hashes; 3). di= pseudo random number sequence, called pseudo-random detection re-hash. 2. Re-hashing: Hi=rhi (key), i=1,2,..., K RHi are different hash functions, that is, when a synonym generates an address conflict, computes another hash function address, until the conflict no longer occurs, this method is not easy to generate "aggregation", but increase the calculation time. 3. Chain Address Method (Zipper method) 4. Create a public overflow area   Find the performance analysis hash list is basically the same as the watchmaking process. Some key codes can be found directly through the address of the hash function transformation, and some key codes have conflicts on the address of the hash function and need to be searched by the method of dealing with conflicts. In the three methods described for dealing with conflicts, post-conflict lookups are still the process of comparing a given value to a key code. Therefore, the measurement of the efficiency of the hash table is still measured by the average lookup length. In the process of searching, the number of key code comparisons depends on how many conflicts are generated, the conflict is less, the search efficiency is high, the conflict is more, and the search efficiency is low. Therefore, the factors that affect the number of conflicts, that is, the factors that affect the search efficiency. There are three factors that affect the number of conflicts: 1. The hash function is uniform; 2. Methods of dealing with conflicts; 3. The reload factor for the hash table. The reload factor for a hash list is defined as: α= the number of elements in the table/the length of the hash list is the marker factor for the full extent of the hash list. Since the length of the table is fixed, α is proportional to the number of elements in the table, so the larger the alpha, the more elements are filled in the table, the more likely the conflict will be, and the smaller the alpha, the less likely it will be to have a conflict. In fact, the average lookup length of a hash table is the loading factor α function, but there are different functions for different methods of dealing with conflicts. Understand the basic definition of hash, you can not mention some well-known hash algorithm, MD5 and SHA-1 is the most widely used hash algorithm, and they are based on MD4 design. Introduction of commonly used hash algorithms: (1)MD4MD4 (RFC 1320) was designed by MIT's Ronald L. Rivest in 1990, and MD is the abbreviation for message Digest (Messages Digest). It is implemented with high-speed software on a 32-bit word processor-it is based on a bitwise operation of 32-bit operands. (2)MD5MD5 (RFC 1321) is an improved version of Rivest in 1991 for MD4. It still groups the input in 512 bits, and its output is a cascade of 4 32-bit words, the same as MD4. MD5 is more complex than MD4 and slower, but more secure, and better in terms of resistance to analysis and differential resistance. (3)SHA-1 and other SHA1 are designed by the NIST NSA for use with DSA, which produces a hash value of 160bit in length for inputs of less than 264, thus providing better anti-exhaustive (brute-force). The SHA-1 design is based on the same principles as MD4 and mimics the algorithm. Hash functions are often designed for an application because of the diversity of application of hash functions. For example, the cryptographic hash function assumes that there is an enemy to find the original input with the same hash value. A well-designed cryptographic hash function is a "one-way" operation: there is no practical way to calculate a primitive input for a given hash value, which means it is difficult to forge. Functions designed for cryptographic hashing purposes, such as MD5, are widely used as test hash functions. When the software is downloaded, the correct file section is downloaded against the verification code. This code is likely to change due to changes in environmental factors, such as machine configuration or IP address changes. To ensure the security of the source files. Error monitoring and repair functions are primarily used to identify cases in which the data is disturbed by random processes. When the hash function is used for checksums, it is possible to verify that any length of data has been changed with a relatively short hash value. Error CorrectionA hash function can be used to visually detect errors that occur when the data is transmitted. In the sender of the data, the hash function is applied to the data to be sent, and the result of the calculation is sent with the original data. In the receiver of the data, the same hash function is once again applied to the received data, if the two hash function calculated results inconsistent, then the data in the process of transmission in some places error. This is called redundancy checking. For error correction, it is assumed that the distribution of similar disturbances is close to minimum (a distribution of likely perturbations is assumed at least approximately). For a message string the perturbation can be divided into two categories, large (impossible) errors and small (possible) errors. We redefine the second type of error as follows, given H (x) and x+s, so long as s is small enough, we can effectively calculate X. That hash function is called the error correction code. These error correction codes have two important classifications: cyclic redundancy check and Reed Solomon code. Speech RecognitionFor applications such as matching a MP3 file from a known list, one possible scenario would be to use traditional hashing functions-such as MD5, but this scenario is very sensitive to time translation, CD read errors, different audio compression algorithms, or the implementation mechanism of volume adjustment. Using some MD5-like methods is useful for quickly finding audio files that are strictly the same (viewed from the binary data of an audio file), but to find all the same audio files (from the content of the audio file) you need to use other, more advanced algorithms. Those who do not follow the IT industry trend tend to do the opposite, with hash functions that are sufficiently robust for small differences to exist. Most of the existing hashing algorithms are not robust enough, but there are a few hashing algorithms that can be used to discern the robustness of music played out from the noisy room's loudspeakers. There is a practical example of shazam[1] service. The user can dial a specific number using the telephone and close the phone's microphone to the speaker used to play the music. The service parses the music that is playing and compares it to a known hash value that is stored in the database. Users will be able to receive the song name of the recognized music (a fee will be charged) Information SecurityThe application of hash algorithm in information security is mainly embodied in the following 3 aspects: (1)File verification We are familiar with the parity check algorithm and CRC checksum, these 2 kinds of calibration does not have the ability to resist data tampering, they can detect and correct the channel error in the data transmission, but can not prevent malicious damage to the data. MD5 Hash Algorithm's "digital fingerprint" feature makes it the most widely used file integrity checksum (Checksum) algorithm, and many UNIX systems have the command to provide calculation MD5 Checksum. (2)The hash algorithm of digital signature is also an important part of modern cipher system. Because of the slow operation of the asymmetric algorithm, the one-way hash function plays an important role in the digital signature protocol. A digital signature of a hash value, also known as a "digital digest", can be statistically considered equivalent to a digital signature on the file itself. And there are other advantages to such an agreement. (3) authentication AgreementThe following authentication protocol is also known as the challenge-the certification mode: This is a simple and secure way to be able to listen to a transmission channel but not tamper with it. These are some basic preliminary knowledge about hash and its related.   hash function (1) Remainder method: First estimate the size of the table items in the entire hash table. This estimate is then used as a divisor to remove each original value, resulting in quotient and remainder. Use the remainder as the hash value. Because this approach creates a large likelihood of collisions, any search algorithm should be able to determine whether a conflict has occurred and propose a replacement algorithm. (2) Folding method: This method is used when the original value is a number, the original value is divided into several parts, and then overlay each part, resulting in the last four digits (or the number of other digits can be taken as a hash value). (3) Cardinal conversion method: When the original value is a number, you can convert the number of the original value to a different digit. For example, you can convert the original decimal value to a hexadecimal hash value. To make the hash value the same length, you can omit the high-order number. (4) Data rearrangement: This method simply scrambles the data in the original value. For example, third-to sixth-digit numbers can be sorted in reverse order, and then use the reflow number as the hash value. Hash functions are not common, for example, in a database with a hash function that can obtain good results, it may not be feasible to use cryptography or error checking. There are several well-known hashing functions in the field of cryptography. These functions include MD2, MD4, and MD5, which use hashing to convert a digital signature to a hash value called an Information Digest (message-digest), plus a secure hashing algorithm (SHA), a standard algorithm capable of generating a larger (60bit) Summary of information, Somewhat similar to the MD4 algorithm. The hash value of the file we all know that emule is based on peer (peer-to-peer abbreviation, refers to the peer network under the customer to the customer file transfer software), it uses the "multi-source file Transfer Protocol" (Mftp,the Multisource Filetransfer Protocol). In the Protocol, a series of criteria for transmission, compression, and packaging, as well as integration, is defined, and emule has md5-hash algorithm settings for each file, making the file unique and traceable across the network. The Digital Digest of the md5-hash-file is computed by the Hash function. Regardless of the length of the file, its hash function evaluates to a fixed-length number. Unlike cryptographic algorithms, this hash algorithm is an irreversible one-way function. With a high-security hash algorithm, such as MD5, Sha, two different files are almost impossible to get the same hash result. Therefore, once the file has been modified, it can be detected. When our files are placed in emule for shared publishing, EmulE will automatically generate the hash value of this file based on the hash algorithm, which is the only identity symbol for this file, which contains the basic information of the file and submits it to the attached server. When someone else wants to make a download request for the file, the hash value lets others know if the file he is downloading is what he wants. This value is especially important after the other properties of the file have been changed (such as name, etc.). And the server also provides, the file is currently located in the user's address, port and other information, so emule know where to download. In general, we want to search for a file, emule after this information, will be added to the server issued a request to obtain the same hash value of the file. The server then returns the user information that holds the file. This way our client can communicate directly with the user who owns the file and see if it is possible to download the required files from him. The hash value of the file in the emule is fixed and unique, it is equivalent to the information digest of this file, regardless of the file on whose machine, his hash value is constant, no matter how long it takes, this value is consistent, when we are in the process of downloading the file upload, emule this value to determine the file.  hash files We often see in the emule log, emule is a hash file, here is the use of the hash algorithm file checksum function, the article has said some of these features, in fact, this part is a very complex process, in the FTP, BT and other software is used in this basic principle, emule inside is the use of File block transmission, so that each piece of transmission to be compared to check, if the error is to be re-downloaded, during which the relevant information written to the Met file, until the entire task is completed, this time the part file is renamed, Then use the move command, transfer it to the incoming file, and then the Met file is automatically deleted, so we sometimes encounter a hash file failure, that is, the information in the Met inside the error can not be enough and part file matching, and some time to start also crazy hash, There are two situations when you use the first time, this time to hash out all the file information, there is also a situation is the last time you shut down the computer, then this time is to do debugging. The research on the algorithm of hash, has been a frontier in information science, especially in the popularization of network technology today, his importance is more and more prominent, in fact, we do the information on the Internet every day security verification, we use the operating system key principle, there is its figure, Especially for those who are interested in studying information security, this is a key to open the information world, he is also a focus of research in the hack world. Userhash reason, whenWhen we first use the emule, emule will automatically generate a value, this value is unique, it is our mark in the emule world, as long as you do not uninstall, do not delete config, your Userhash value will never change, the integration system is through this value in the function, emule inside the integral preservation, identification, all use this value, and your ID and your user name regardless, you arbitrarily how to change these things, your Userhash value is unchanged, which also fully guaranteed fairness. In fact, he is also a summary of information, but not to save the file information, but each of us information. The   hash list is a major application of the hash function, and the hash table is used to quickly find data records by keyword. (Note: Keywords are not as secret as they are used in encryption, but they are used to "unlock" or access data.) For example, the keywords in the English dictionary are English words, and their related records contain the definitions of those words. In this case, the hash function must map the alphabetical string to the index created for the internal array of the hash table. The almost impossible/impractical ideal of hash hash functions is to map each keyword to a unique index (refer to the perfect hash), because it guarantees direct access to every data in the table. A good hash function (including most cryptographic hash functions) has a uniformly true random output, so the target can be found on average only one or two probes (depending on the filling factor). It is also important that the random hash function is almost impossible to have a very high conflict rate. However, a small amount of the estimated conflicts is unavoidable in the actual situation (refer to the birthday paradox). In many cases, the heuristic hash function produces more collisions than a random hash function. The heuristic function takes advantage of the similarity of similar keywords. For example, you can design a heuristic function to make it look like file0000.chk,file0001. chk,file0002. CHK, etc. such a file name is mapped to a continuous pointer on the table, meaning that such a sequence does not conflict. In contrast, for a good set of keyword-performance random hash functions, for a bad set of keywords often performance is poor, this bad keyword will naturally occur and not only in the attack. A poorly performing hash function table means that the lookup operation degrades to a time-consuming linear search. &NBSP;&NBSP;MD5, SHA1 's hack August 17, 2004, at the International Code Congress held in Santa Barbara, California, Shandong University, Professor Xiao at the International Conference for the first time announced her and her research team's research results-for MD5, HAVAL-128, MD4 and RIPEMD Four well-known cryptographic algorithms decoded results. The following February announced the crack SHA-1 password.  linux commandThe--hashhash command is used to display, add, and clear a hash table. The syntax format for this command is as follows.   Syntax hash [-l] [-r] [-P <path> <name>] [-t <command>]  option description
Options Description
-L Display hash table, including path
-R Clear Hash Table
-P <path> <name> Adding content to the hash table
-T <command> Displays the full path of the specified command
Hash command hash displays a # number each time the data in the data buffer is transferred

The Java learning Hash

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: