Distinction between classification and clustering classification (classification):
A classifier will "learn" from the training it receives, thus having the ability to classify unknown data, a process typically called supervised learning (supervised learning). The so-called classification, in simple terms, is based on the characteristics of the text or attributes, divided into the existing categories.
Common classification algorithms include: Decision tree classification, naive Bayesian classification algorithm (native Bayesian classifier), classifier based on support vector Machine (SVM), neural network method, K-Nearest neighbor method (K-nearest neighbor,knn), Fuzzy classification and so on. Classification as a supervised learning method requires that all categories of information be clearly known in advance, and asserts that all items to be categorized have a category corresponding to them. However, many times the above conditions are not satisfied, especially when processing massive data, if the data to meet the requirements of classification algorithm through preprocessing, the cost is very large, at this time can consider the use of clustering algorithm. Clustering (cluster):
In a nutshell, it's a group of similar things, when clustering, we don't care what a certain type of thing is, we need to achieve the goal is to bring similar things together, so a clustering algorithm usually only need to know how to calculate the similarity can start to work, so clustering There is usually no need to use training data for learning, which is called unsupervised learning (unsupervised learning) in machine learning. The purpose of cluster analysis is to classify similar things, and the individuals in the same class have greater similarity, and the difference of different individuals is very great.