The common methods of machine learning are mainly divided into supervised learning (supervised learning) and unsupervised learning (unsupervised learning).
Supervised learning, which is often said to be classified , is trained to obtain an optimal model (a set of functions, the best of which is optimal under a certain evaluation criterion) through the training sample ( known data and its corresponding output ). Using this model to map all the input to the corresponding output, the output is simply judged to achieve the purpose of classification, it also has the ability to classify the unknown data. In people's understanding of things, we have been taught by adults from the beginning of the children this is a bird, that is a pig, that is a house ah, and so on. We see the scene is the input data , and the adults to the judgment of these scenes (house or bird) is the corresponding output . When we see more, the brain slowly get some generalization of the model, this is the training to get the (or those) function, so do not need adults in the next point, we can tell which is the house, which is the bird. The typical example in supervised learning is KNN and SVM.
Unsupervised learning (also known as unsupervised learning, anyway) is another study of more learning methods, and it differs from supervised learning in that we do not have any training samples in advance and need to model the data directly. This may sound like a bit of a mystery, but there are many places in our own world that we use unsupervised learning. For example, we go to a painting exhibition, we are completely ignorant of the art, but after appreciating a lot of works, we can also divide them into different factions (such as which is more hazy, which is more realistic, even if we do not know what is called the hazy faction, what is called realism, but at least we can divide them into two classes). A typical example of unsupervised learning is clustering . The purpose of clustering is to bring together things that are similar, and we do not care what this class is. Therefore, a clustering algorithm usually needs to know how to calculate the similarity to begin to work.
So when should we use supervised learning and when should we use unsupervised learning? I was also asked this question from the process of an interview before I began to seriously consider the answer. A very simple answer is to start with the definition, if weThere is a training sample (training data) in the process of classification, you can consider the method of supervised learning, if there is no training sample, it is impossible to use the supervised learning method。 But in fact, in the course of answering a realistic question, even if we do not have a ready-made training sample, we can use our own eyes to manually label some samples from the data to be classified and take them as training samples, so that the conditions can be improved and supervised learning. Of course, we have to say that sometimes the data is very covert, that is, the information we have at hand is not abstract form, but a lot of specific numbers, so it is difficult for us to classify them simply by the people themselves. This seems a little bit clear, for example, in the bag of words model, we use K-means method clustering to the data projection, this time with K-means is because we have only a large pile of data, and is very high-dimensional, When we want to divide them into 50 classes, we have no ability to tag each data to say which class the number should be, and which class it should be. So it is only unsupervised learning that can help us in this situation. So, can we ask further, if there is a training sample (or if we can get some training data), will supervised learning be more appropriate than unsupervised learning? (according to our simple thinking, there are high-skilled teaching is better than their own understanding of the quasi, come on!) I think this is generally the case, but it has to be a concrete look at the training data acquisition. In my recent research, I have manually marked a large number of training samples (of course, these samples are basically accurate), and the sample in the feature space to find the linear scalability is very good, just near the classification surface there are always some confusing data samples, so that the classification of the linear classifier after the sample will be misjudged. However, if the mixed Gaussian model (GMM) is used to divide, these confusing points are more correctly categorized. One explanation for this phenomenon is that not all data are distributed independently of each other, whether it is a training sample or a cluster of data. In other words, there is a connection between data and the distribution of data. In a lot of the material I read about supervised learning, there was no explanation for the hypothesis (independent distribution) of the training data until I read the tip of a book. For different scenarios, the distribution of positive and negative samples can be skewed (perhaps with large offsets or smaller offsets), so that supervised learning may not be as effective as unsupervised learning.
Reprint Address: http://blog.csdn.net/jwh_bupt/article/details/7654120
Supervised learning and unsupervised learning