Cluster Analysis notes

Source: Internet
Author: User

1. What is clustering?
Definition:
The process of grouping a collection of powerless or abstract objects into multiple classes composed of similar objects is called clustering.
A cluster generated by a cluster is a collection of data objects. These objects are similar to objects in the same cluster and different from objects in other clusters.

Different from classification, classification provides guidance for learning-the number of classes is known.
Clustering is unsupervised learning-the number of classes is unknown
Typical applications:
In business, we analyze different customer groups and use the purchase model to portray the characteristics of different customer groups.
In biology, it is used to deduce the classification of plants and animals, classify genes, and gain a certain understanding of the inherent structure of the population.
In a game, you can classify players, games, and game roles to obtain information.
Active research topics:
Typical requirements for data mining object classes:
Scalability of clustering methods: high scalability (processing data volume)
Ability to process different types of attributes:
Clusters of any shape are found:
FIELD knowledge used to determine input parameters is minimized: input parameters have a great impact on cluster analysis.
Ability to process noise data:
Insensitive to the order of input records:
High Dimension:
Constraint-based clustering:
Interpretability and availability:
2. Data Types in Cluster Analysis
(1) Data Matrix
P variables are used to represent n objects, NXP matrix.
(2) similarity Matrix
Store the approximation between n objects, nxn Matrix
A data matrix is called a two-mode matrix, while a difference matrix is called a single-mode matrix.

Interval scale variable
(1) Calculate the absolute deviation of the average.
Sf = (| x1f-MF | + | x2f-MF | +... + | Xnf-MF |)/n
X1f ,..., Xnf is the N measurement values of F, and MF is the average value of F, that is, mf = (x1f + x2f +... + Xnf)/n
(2) calculate standardized measurement values
Zif = (xif-mF)/SF

{
Function onclick ()
{
Diggit (, 1)
}
} "> 0 {
Function onclick ()
{
Diggit (, 2)
}
} "> 0

0

(Please Article Make comments)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.