Minimum hash function

Source: Internet
Author: User

1.1 Introduction to Algorithms

The minimum hash (minhash), in simple terms, is to extract the n items randomly from all the products that the user likes, and for several users who are the same for extracting the n items, they are considered to be similar users and belong to the same class. For example, user a likes product {A, B, c}, User B likes goods {B, C, d}, User C likes goods {C, E, F}, in each user's favorite list of products are randomly extracted 2 items, user A out of the product is {B, c}, User B out of the product is {B, c}, user C extracted the product is { C, E}, since both user A and User B have two items that are {B, c}, it is considered that user A and User B are users of the same interest and belong to the same class. User C pulls out products that are not identical to user A and User B merchandise, so user C and user A, User B are not the same user.

This is called the minimum hash, which means that the method used to extract the product is the least-hashed method. A hash value is computed for each product (which can be considered a function, the input is a commodity, a value is returned), and the hash value is minimized as a product of the extraction. If you want to extract multiple items at the same time, you can use multiple hash functions to calculate multiple hash values. Or the above example, to extract two items, prepare two hash functions H1 and H2 (as a function, the hash can be considered a name, the function tries to ensure that different values calculate different results), respectively, each product to find a value. For user A, use the H1 function to calculate three values H1 (a), H1 (b), H1 (c), compare their size, if the smallest is H1 (b), then the first recorded, in the same H2 function to calculate the same three values, take the minimum value, such as H2 (c), then we think "H1 (b) _h2 (c) "On behalf of user A belongs to such a group." If User B uses the H1 function and the H2 function to calculate the minimum value of H1 (b) and H2 (c), it is considered that User B also belongs to the "H1 (b) _h2 (c)" group, and user A and User B are groups, that is, interest-like.

In the application, multiple hash functions, such as 2n hash functions, are used, so that for a user, it is divided into n groups (as described above, the values of two hash functions are composed of a group number, 2n hash functions, which can form n groups).

1.2 Algorithm Application

The minimum hash here belongs to a clustering algorithm, according to users and their favorite products to the user clustering, will exist the same like the product of the user to gather into a class.

The advantages of the minimum Hachizu class algorithm are:

(1) The calculation is relatively simple, compared with 22 users, the efficiency is higher

(2) Ability to quickly identify users with the same product

(3) can easily cluster users

The disadvantages of the minimum Hachizu class algorithm are:

(1) The hash function has a certain contingency, it is likely to be omitted.

(2) When using more hash functions, the number of groupings will be increased, and the amount of data will increase linearly with the number of hash functions.

The minimum hash pair shape, such as "User, commodity" data to calculate, to get a group of multiple users, play a role in clustering. Of course, it can also be "objects, attributes," Such data, to cluster items.

Minimum hash function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.