Definition of cluster Analysis:
Clustering refers to the process of grouping a collection of physical or abstract objects into multiple classes consisting of similar objects, with the purpose of collecting data on a similar basis for classification.
The traditional clustering algorithm can be divided into five categories: Partition method, hierarchical method, density-based method, grid-based method and model-based method.
"K-means algorithm"
First, the K-objects are selected from N data objects as the initial clustering centers, and for the remaining objects, they are assigned to the clusters of the most similar (the cluster centers) according to their similarity (distance) to the cluster centers. The cluster center of each new cluster (the mean value of all objects in the cluster) is then computed, and the process is repeated until the standard measure function begins to converge. Mean variance is generally used as the standard measure function. K clusters have the following characteristics: Each cluster itself is as compact as possible, and each cluster is as separate as possible.
What aspects of NLP are actually applied??
"K-pototypes algorithm"
The k-pototypes algorithm combines the K-means method with the improved K-means method.a k-modes method to handle symbolic attributes, compared with the K-means method,The k-pototypes algorithm is capable of handling symbolic attributes.
"Fuzzy Algorithm"
First of allestablishing fuzzy similarity matrix, and then cluster them. There are generally two types of see.
"Clarans Algorithm" (partitioning method)
Clarans algorithm, random search clustering algorithm, is a kind of segmentation clustering method. It first randomly selects a point as the current point, and then randomly checks that it does not exceed some of the adjacency points of the parameter Max neighbor, and if a better adjacency point is found, it is moved into the adjacency point, otherwise the point is the local minimum amount. Then randomly select a point to find another local minimum, until the local minimum number of the found bibliography reaches the user's request.
"Birch Algorithm" (Hierarchical method)
The core is to use a clustering feature 3-tuple to represent a cluster of information, so that the expression of a cluster of points can be used to correspond to the clustering characteristics, rather than a specific set of points to represent. Clustering is obtained by constructing a clustering feature tree which satisfies the branch factor and cluster diameter limit.The birch algorithm can conveniently perform the operation of center, radius, diameter and distance between classes and classes by clustering features.
"Cure Algorithm" (Hierarchical method)
The Cure algorithm uses a clustering method that represents points. The algorithm first regards each data point as a class, then merges the nearest class until the number of classes is required. The cure algorithm improves the traditional representation of a class by avoiding the use of all points or the center and radius to represent a class, but instead extracting a fixed number of points from each class as a representation of this class, and multiplying the points by an appropriate contraction factor to bring them closer to the center point of the class. A class is represented by a representation point so that the extension of the class can be expressed to a non-spherical shape.
"Dbscan algorithm" (density-based approach)
By using the density connectivity of a class, you can quickly discover classes of any shape. For each object in a class, the domain in which the radius is given cannot contain less than a given minimum number of objects.
Via: Big Data Magic mirror the first free Big Data visualization analysis tool in China www.moojnn.com
Clustering Analysis Algorithm---Learning