We all should have the experience of buying watermelon in our lives. When buying watermelon, elders will give us experience, such as tapping on the surface of the melon to make some kind of sound is a good melon. The reason why elders will make good melons based on such characteristics is based on their life experience, and with the rich experience, they predict the ability of good melon is also improving. Herbert A. Simon has given the following definition of "learning": "If a system can improve its performance by performing a process, that is learning" according to this point of view, the above process in the elders should have happened to learn. So if the computer has such experience, whether it can also learn to identify the quality of watermelon skills, whether the data and statistical methods to make computer learning.
To solve this problem, we first collect "experience"-the dataset (data set) for the machine. A DataSet is a collection of many samples, each with a variety of properties and characteristics. For example, for the abstract class of watermelon, it has a "color", "root", "knock" three properties. If you think of these three attributes as axes, they will open up a three-dimensional space. Each watermelon has its own value on different attributes, and if it is mapped to a vector, each watermelon will fall on a point in this three-dimensional space. The space spanned by the attribute is called the sample space. Given a data set of the following:
By observing this simple data set consisting of four watermelons, we may have the following assumptions:
1. Knocking = turbid watermelon is a good melon
2. Pedicle = curled watermelon is a good melon
3. Knocking = turbid and root = curled watermelon is a good melon
This is the assumption that we have made for logical reasoning, and these assumptions are "models" that machine learning needs to learn.
But what to do if the machine learns these assumptions.
The basic idea is to provide all the assumptions for the machine, to validate each hypothesis, and to reject the hypothesis if it finds data that does not conform to the hypothesis. In the case of watermelon above, suppose "color", "root pedicle", "knock" respectively have 3, 2, 2 kinds of possible value, also consider the attribute can take any value, such as: Regardless of color, root pedicle = curl up and knock sound = cloudy sound melon are good melon. So each attribute has a value of * (any value) in addition to the value of a single meaning. Therefore, the above problems will produce 4*3*3=36 hypothesis altogether. This set of all assumptions is called the hypothetical space. The hypothetical space for the watermelon problem is shown in the figure:
For this hypothetical space, the machine has to do is search. The assumptions that are inconsistent with the positive examples, or those that are consistent with the inverse examples, are deleted. The last remaining assumption is a consistent set of assumptions with the training set (for learning data), which is called version space. And from this version of the space to choose the assumptions we adopted is also a very learned thing, perhaps a separate article to introduce.