Differences between data classification and clustering

Source: Internet
Author: User

To put it simply, classification or classification is to label the object according to a certain standard, and then classify the object according to the label.
Clustering refers to the process of finding out the cause of clustering between things through some clustering analysis without "tags" first.

The difference is that the category is defined in advance, and the number of categories remains unchanged. The classifier must be trained by manually labeled training corpus and belongs to the category of Guided Learning. Clustering does not have a pre-defined category, and the number of categories is uncertain. No manual tagging or pre-training classifier is required for clustering. Classes are automatically generated during clustering. Classification is suitable for situations where categories or classification systems have been defined, such as classification books by country chart; Clustering is suitable for situations where there is no classification system or the number of categories is uncertain. It is generally used as the front-end of some applications, for example, multi-document summarization and post-search engine result clustering (meta-search.
The purpose of classification is to learn a classification function or classification model (also known as classifier). This model can map data items in the database to a class in a given category. To construct a classifier, you must have a training sample dataset as the input. A training set consists of a set of database records or tuples, each of which is a feature vector consisting of values of relevant fields (also known as attributes or features). In addition, a training sample also has a category tag. A specific sample can be expressed as follows: (V1, V2,..., vn; c). VI indicates the field value, and C indicates the category. Classifier construction methods include statistical methods, machine learning methods, and neural network methods.
Clustering refers to integrating non-class samples into different groups based on the principle of "Object-based clustering". Such a set of data objects is called a cluster, and describe each of these clusters. The purpose is to make the samples of the same cluster should be similar to each other, and the samples of different clusters should be not similar enough. Unlike classification rules, before clustering, you do not know which groups you want to divide into or what groups you want to define, or which spaces are used to differentiate rules. The objective is to discover the functional relationships between attributes of a spatial object. The knowledge of mining is expressed by mathematical equations of attributes named variables. Clustering technology is booming in the fields of data mining, statistics, machine learning, spatial database technology, biology, and marketing, clustering Analysis has become an active research topic in the field of data mining. Common Clustering Algorithm Including K-means, K-center, CLARANS, birch, clique, and DBSCAN.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.