Calculation of similarity of jaccard similarity coefficient

Source: Internet
Author: User

Jaccard Indexfrom Wikipedia, the free encyclopedia

The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient d E Communauté by Paul Jaccard), was a statisticused for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection div IDed by the size of the union of the sample sets:

(If A and b are both empty, we define J(A,b) = 1.)

The Minhash min-wise independent permutations locality sensitive hashing scheme may is used to efficiently compute an ACCU Rate estimate of the jaccard similarity coefficient of pairs of sets, where each set are represented by a constant-sized si Gnature derived from the minimum values of Ahash function.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard Coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of The sizes of the Union and the intersection of the sets by the size of the Union:

An alternate interpretation of the Jaccard distance are as the ratio of the the size of the symmetric to the Union.

This distance are a metric on the collection of all finite sets. [1][2]

There is also a version of the Jaccard distance for measures, including probability measures. If is a measure to measurable space, then we define the Jaccard coefficient by, and the Jaccard distance by. Care must is taken if or, since these formulas is not well defined in the case.

Calculation of similarity of jaccard similarity coefficient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.