Pattern Recognition Class Notes clustering (1)

Last Update:2016-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Definition: The data is divided into categories, within the same class, the object (entity) has a high similarity between the different objects of the difference between the larger.

For a group of sample sets without category labels, according to the similarity between the sample classification, similar to a class, not the same as other classes. This classification is called Cluster analysis, also known as unsupervised classification.

2. The result depends on two factors: the first is the choice of the task, the same sample different tasks will get different clustering effect, the second is the choice of similarity measure standard, choose different similarity measure directly affect the quality of clustering effect.

3. Classification:

By clustering criteria: statistical clustering method, concept clustering method;

By data type: Numerical data clustering, discrete data clustering, mixed-type data clustering;

In accordance with the measurement criteria:

Distance-based Clustering method: measure the relationship between point pairs, such as K-means, based on various distances or similarities.

Density-based Clustering method: Clustering samples based on appropriate density functions.

Clustering method based on connectivity: mainly includes graph-based method. Highly connected data is usually clustered into clusters, such as spectral clustering.

Follow the different technical routes:

Partition method: Use certain rules to divide the data, such as K-means and so on.

Hierarchical method: Hierarchical classification of a given sample, such as hierarchical clustering.

Density method: The density of the data is evaluated, such as Gaussian mixture model.

Grid method: Divides data space into finite unit network structure, then clustering based on network structure

Model method: A model is introduced for each cluster, then the data is divided to meet the assigned model.

4. Distance and similarity measurement

See also:http://www.cnblogs.com/simayuhe/p/5297560.html

Note: The so-called distance to meet a few four conditions, we can call the distance:

5. Mixed density function

Mixed density estimation can provide methodological guidance for data clustering * * *

Note: The discussion here is a generalization of the form of clustering, Gaussian mixture is only a more common example, is not unique.

Assume:

– The sample comes from a different class of C, and C is known.

– the prior probability of each class occurrence is known, j = 1, 2, ..., C.

– The form of the class conditional probability density function is known.

–c a parameter vector, j = 1, 2, ..., C, is unknown.

– The category label for the sample is also unknown.

First, we discuss the data generation process: First select a class from the C category and then sample a sample from this class by conditional probability density.

And then we're going to do the opposite of the build process, which means we get a bunch of untagged samples, although we also assume that the sample obeys the mixed density distribution,

However, we do not know the proportion of each category, and the parameters of the conditional probability density of each category, and they are estimated by the method of maximum likelihood estimation. (c is still known)

See "Pattern Recognition" Zhang Xue third edition p187

Logarithmic likelihood:

Right:

Yes, due to constraints: the optimization problem of solving equality constraints usually uses the Lagrange multiplier method:

Finally get:

In total: Two conditions are:

The above is a general derivation, and the results of the derivation are then applied to the Gaussian mixture:

Each of the components in the Gaussian mixture conforms to the multidimensional normal distribution form as follows

When the mean value of variance is known to be unknown

Brought to condition 2.

Note X should have a corner mark K;

To extract the mean from this equation:

Open, written in the form of weights:

The above formula indicates that the maximum likelihood estimate of the class mean is the weighted average of the sample. The weights indicate the possibility that the sample XK belong to Class I.

Note that weights are only related to class I samples, simplifying the above formula

The introduction of a more specific orange method from the--k-means cluster, where K refers to the above mentioned given the number of categories C, the above simplification to do a reporting

Here the so-called nearest is required to be given a distance measurement method, such as Euclidean distance

Algorithm Description:

Pattern Recognition Class Notes clustering (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pattern Recognition Class Notes clustering (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Pattern Recognition Class Notes clustering (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support