The essential difference between classification and clustering in machine learning

Last Update:2015-07-02 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The essential difference between classification and clustering in machine learning

There are two kinds of big problems in machine learning, one is classification and the other is clustering .
In our life, we often do not too much to distinguish between the two concepts, think that clustering is the classification, classification is almost the cluster, below, we will specifically study the classification and clustering in the data mining between the essence of the difference.

Classification

The classification has the following kinds of statements, but the meaning of the expression is the same.

Classification (classification): Classification task is to learn to get a goal function f, each attribute set X is mapped into a predefined class designator Y.
Classification is based on a sample of a given class of known categories, training a learning machine (that is, getting some kind of objective function) so that it can classify samples of unknown categories. This belongs to supervised learning (supervised learning).
Classification: The relationship between sample properties and class labels is obtained by learning.
In our own words, we get a classification model based on some known samples (including attributes and class labels), which is the function between the sample attribute and the class label, and then the target function is used to classify the sample data that contains only the attributes.

Limitations of classification algorithms

Classification, as a supervised learning method, requires that the information of each category be clearly known beforehand, and that all categories to be categorized have a corresponding category. However, many times the above conditions are not satisfied, especially when dealing with large amounts of data, if the data through preprocessing to meet the requirements of the classification algorithm, it is very expensive, this time you can consider the use of clustering algorithm.

Clustering

Some of the concepts related to clustering are as follows

and clustering refers to the first not know any sample of the class label, I hope that through an algorithm to divide a group of unknown categories of samples into several categories, when clustering, we do not care about what a class is, we need to achieve the goal is to bring together similar things, which in machine learning is called unsupervised Learning (unsupervised learning)
In general, people define clustering based on some distance or similarity between samples, that is, to cluster similar (or distant) samples into the same class, and to classify non-identical (or distant) samples in other classes.
Target of clustering: objects within a group are similar (related) to each other, and objects in different groups are different (unrelated). The greater the similarity within the group, the greater the difference between groups, the better the clustering.

Comparison of classification and clustering

Cluster analysis is the study of how to divide a sample into several classes without training.
In the classification, you know what classes exist in the target database and what kind of tags each record belongs to.
Clustering needs to solve the problem is to set a given number of unmarked patterns to become a meaningful cluster, clustering is not known to the target database exactly how many classes, you want to make all the records of different classes or clustering, and in such a classification case, A metric (for example, distance) is a standard similarity that minimizes between the same cluster and maximizes between different clusters.
Unlike classification, unsupervised learning does not rely on pre-defined classes or training instances with class tags, it needs to be automatically identified by a clustering learning algorithm, and the instance or data sample of a class learning has a category tag.

To describe the content

Because recently in the researcher two kinds of algorithms, also just uses to say the classification and the clustering different algorithm.
One of the differences between SVM and binary K-means algorithm: Support vector machine (SVM) is a classification algorithm, and the binary K-means algorithm belongs to a kind of clustering algorithm.

In the introduction to data mining (full version), on page No. 306 of this book, there is a sentence: Clustering can be seen as a classification that creates tags for objects with class labels, but only those labels can be exported from data. By contrast, the previously mentioned classification is supervised classification (supervised classification): The model is developed using objects known to have class labels, and new, unlabeled objects are assigned class labels. For this reason, it is sometimes called cluster analysis as unsupervised classification (unsupervised classification). In data mining, when a term classification is used without any conditions attached, it usually refers to supervised categorization.

Therefore, one of the differences between SVM and binary K mean algorithm is that support vector machine (SVM) is a supervised classification algorithm, and the binary k mean algorithm belongs to an unsupervised classification algorithm .

The essential difference between classification and clustering in machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More