The essential difference between classification and clustering in machine learning _ machine learning

Source: Internet
Author: User
Tags svm
The essential difference between classification and clustering in machine learning

There are two kinds of big problems in machine learning, one is classification, the other is clustering.
In our life, we often do not have too much to distinguish between these two concepts, think clustering is classification, classification is almost clustering, the following, we will specifically study the classification and clustering in the data mining between the essential difference. Classification

The classification has the following kinds of statements, but the meaning of the expression is the same.

Classification (classification): The task of categorization is to get an objective function f by learning, mapping each attribute set X to a predefined class label Y.

Classification is the training of a learning machine (i.e. getting some objective function) based on a sample of a given number of known categories, so that it can classify samples of unknown classes. This belongs to supervised learning (supervised learning).

Classification: By learning to get the relationship between the sample attribute and the class label.
In our own words, we get the classification model (i.e. the function between the sample attribute and the class label) based on some known samples (including attributes and class labels), and then use this objective function to classify the sample data that contains only the attributes. Limitations of the classification algorithm

Classification as a supervised learning method requires that all categories of information be clearly known in advance, and asserts that all items to be categorized have a category corresponding to them. However, many times the above conditions are not satisfied, especially when processing massive data, if the data to meet the requirements of classification algorithm through preprocessing, the cost is very large, at this time can consider the use of clustering algorithm. Clustering class

Some of the concepts associated with clustering are as follows, and clustering refers to the category label of any sample that is not known beforehand, we want to use some sort of algorithm to divide a group of unknown categories into categories, when clustering, we don't care what a certain type of thing is, we need to achieve the goal is to bring similar things together, which in machine learning is called Unsupervised learning (unsupervised learning) Usually, people define clustering according to some distance or similarity between samples, that is, to gather similar (or near) samples into the same class, and to classify samples that are not similar (or distant) into other classes. The goal of clustering: objects within a group are similar (related) to each other, and objects in different groups are different (unrelated). The greater the similarity of the group, the greater the difference between groups, the better the clustering. The cluster analysis of classification and clustering is to study how to divide the samples into several classes without training. In the classification, it is known what classes exist in the target database, and what is to be done is to mark each record as belonging to each category. The problem that clustering needs to solve is to assemble a number of unmarked patterns and make them meaningful clusters. Clustering is when you don't know exactly how many classes the target database has, and you want to make all the records into different classes or clusters, and in this case, The similarity of a metric (for example, distance) is minimized between the same cluster and maximized among different clustering classes. Unlike classification, unsupervised learning does not rely on a predefined class or band-mark training instance, which needs to be automatically identified by the clustering learning algorithm, while the instance or data samples of the classification learning have category tags. To explain the content

Because recently in the researcher two kinds of algorithms, also just used to say the classification and the clustering different algorithm.
One of the differences between SVM and binary K mean algorithm: Support vector machine (SVM) is a classification algorithm, and the binary k mean algorithm belongs to a clustering algorithm.

In the introduction to data mining (full version) the book on page No. 306 says: Clustering can be regarded as a classification, it uses class label to create object tags, but only from the data to export these labels. In contrast, the classification described above is supervised classification (supervised classification): A model developed using a class-labeled object that assigns a class label to a new, unmarked object. For this reason, the cluster analysis is sometimes called unsupervised classification (unsupervised classification). In data mining, the term classification is usually referred to as the supervised classification without any conditions attached.

Therefore, one of the differences between SVM and the binary K mean algorithm is that support vector machine (SVM) is a supervised classification algorithm, and the binary k mean value algorithm belongs to a unsupervised classification algorithm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.