Category: Label objects according to some criteria and classify them by label
Clustering: The process of finding a cohesive cause between things through a cause without a label in advance
Distinction: Classification predefined categories, the number of categories unchanged. The classifier needs to be trained by the classified training corpus of manual annotation, which belongs to the guiding learning category. Taxonomies are suitable for situations where categories or classification systems are identified, such as classifying books according to the country map taxonomy.
Clustering has no predetermined classes, and the number of categories is uncertain. Clustering does not require manual tagging and pre-training classifiers, and categories are automatically generated during clustering. The clustering method is suitable for the situation that does not exist the classification system, the class number is uncertain, generally as the front-end of some applications, such as Multi Document Digest, search engine and post clustering (meta search).
The purpose of classification is to learn a classification function or a classification model (also often called a classifier) that maps a data item in a database to a class in a given category. To construct a classifier, you need to have a training sample dataset as input. A training set consists of a set of database records or tuples, each of which is a vector of values of the field (also called a property or feature), in addition to a class tag for the training sample. The form of a specific sample can be expressed as: (V1,v2,..., vn;c), where vi represents the field value, and C represents the category. The classifier constructs the method to have the statistical method, the machine study method, the neural network method and so on
Clustering is based on ' birds of a feather ' to assemble a sample of no class into a different group, such a set of data objects is called a cluster, and the process of describing each such cluster. The aim is to make the samples belonging to the same cluster resemble each other, and the samples of the different clusters should be sufficiently dissimilar.
Unlike classification rules, it is not known what groups to divide and what groups to classify before clustering, and what space rules are not known to define groups. The purpose of this paper is to find out the function relationship between attributes of spatial entities, and the knowledge of mining is expressed by the mathematical equations of attributes named variables. Clustering involves scope: data mining, statistics, machine learning, spatial database technology, biology, marketing and other fields. Common clustering methods: K-means clustering, K-Center point clustering, Clarans, BIRCH, clique, Dbscan, etc.