Content Summary
The main content of this blog is:
1. Model Selection
2. Bayesian statistics and Regulation (Bayesian statistics and regularization)
The core is the choice of the model, although not so many complex formulas, but he provides more macro guidance, and many times is essential. Now let's begin model selection
Suppose we train different models to solve a learning problem, such as we have a polynomial regression model hθ (x) =g (Θ0+Θ1X+Θ2X2+...+ΘKXK) h_\theta (x) = g (\theta_0 + \theta_1x + \theta_2x^2 +). . + \theta_kx^k), I just like to determine whether the K K value is,...,,..., 10, or our program can automatically calculate K k value, that is, choose a different model to solve the problem and can be in less than fitting and overfitting to a good balance.
Let's first assume that the finite geometry of a model m={m1,m2,..., Md} M = \{m_1,m_2,..., m_d \}, and then select the model we want in this collection. For example, in the above example, Mi m_i refers to the value of K K for the I-I model. So how do we make a choice in this collection? Here we describe the cross-validation approach. Cross-validation
A simple idea to solve the above model selection problem is that I use 70% of the data to train each model, with 30% of the data for training error calculation, and then we compare the training errors of each model, we can choose the training error is relatively small model. If you do not refer to these errors (learn the theory of experience risk minimization--andrew ng machine learning Note (vii)) this blog.
If our training data is very easy to get, then this method will be a good method, because it only needs to traverse the training model once to get a better model. But the training data is often not very easy to get, before I collected an experimental data, it is really a very painful process. So we want to efficiently use our hard-won training data, some people put forward to K-heavy cross-validation (K-fold crosses validation) algorithm, the algorithm process is as follows: The training set S S is divided into K-K parts, recorded as S1,s2,..., Sk s_1,s_2,..., s_ K for each Mi m_i, perform the following procedure:
for j=1,2,..., k j =,..., K
in S1,s2,... Sj−1,