Content Summary
To now supervised learning has basically finished, this blog is mainly to write about the theory of machine learning, that is, when to use what learning algorithm, what kind of learning algorithms have what characteristics or advantages. At the time of fitting, how to choose the fitting model is actually a tradeoff between under-fitting and over-fitting, the size of our training set is appropriate, the final result of the fitting function, how to evaluate the effect, and so on, the following we mainly introduce the empirical risk minimization theory, is to answer these questions. Minimum Experience Risk
To illustrate the minimum risk of experience, we first introduce two theorems: (the Union bound) assume a1,a2 ... Ak A_1,a_2...a_k is a K K different event, then P (A1∪a2∪ ... ∪ak ≤p (A1) +p (A2) +...+p (Ak) p (a_1 \cup a_2 \cup ... \cup a_k) \leq P (a_1) +p (a_2) +...+p (a_k). You can draw a venturi chart to understand him. (hoeffding inequality) assume z1,z2 ... Zm Z_1,z_2...z_m is a variable of m m independent distribution (IID) subject to Bernoulli distribution, with a parameter of Φ\phi. That is, p (zi=1) =ϕ,p (zi=0) =1−ϕp (z_i = 1) = \phi,p (z_i = 0) = 1-\phi, so ϕ^= (1/m) ∑mi=1zi \hat \phi = (1/m) \sum_{i = 1}^{m}z_i,ϕ ^ \hat \phi is also a random variable, for any γ>0 \gamma \gt 0, there is P (|ϕ−ϕ^|>γ) ≤2exp (−2