Deep understanding of machine learning: from principle to algorithmic learning notes-1th Week 02 Easy Entry _

Deep understanding of machine learning: from principle to algorithmic learning notes-1th Week 02 Easy Entry __ Machine learning

Last Update:2018-08-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

deep understanding of machine learning: Learning Notes from principles to algorithms-1th week 02 easy to get started

Deep understanding of machine learning from principle to algorithmic learning notes-1th week 02 Easy to get started 1 General model statistical learning theory frame learner input Learner output A A simple data generation model measures success considerations learner accessible Information 2 experiential risk minimization 3 minimizing empirical risk considering inductive bias

My homepage www.csxiaoyao.com

Chapter two analyzes and proves the factors to be considered in learning problems. Papaya, for example, to learn to determine whether the papaya is delicious, need to observe the color of papaya and the degree of soft and hard to determine whether to eat.

The first is to describe a formal model that can depict similar learning tasks.
2.1 General Model--the theoretical framework of statistical learning 1 input of the learning device

Domain set: X, for example, a collection of all papayas.
Tag set: Y, currently only discusses binary sets, such as {0,1} or {−1,+1}, which means that papaya is delicious and not delicious.
Training data: shaped like s = (x 1, y 1) ... (x m, y m)) A finite sequence in which elements are xxy in pairs, and S is called a training set. 2 The output of the learning device

The H:x→y output prediction rule is also known as a predictor, hypothesis, or classifier, such as predicting whether the papaya in a farmers ' market is tasty or not. A (S) represents the assumption that learning algorithm A is derived from the given training sequence S. 31 Simple data generation models

How training data is generated. First, suppose the instance (papaya) is sampled according to some probability distribution D (island environment). The learner does not know any information about this probability distribution at this time. Suppose the existence (the learner does not know) the correct tag function f:x→y, so that for any i,yi=f (xi), the learner's task only needs to indicate the correct label of the sample (whether papaya is delicious). In summary, the production process of the training set S is to collect the sample point XI according to the probability distribution D, and then give the label to it by using the correct tag function f. (h is the predictive result, F is known as the relational function) 4 measure Success

Classifier (predictive) Error: That is the error of H, that is, the probability of H (x)!=f (x), where x is a random sample collected according to distribution d.
form, given a domain subset a⊂x, probability distribution d,d (a) determines the probability of x∈a, A is more like an expression π:x→{0,1}, that is, a= {x∈x:π (X) = 1} to determine if A is in X, at which point D (a) can be represented by P x∼d [π (X)].
The error rate of the predictive criteria h:x→y is defined as:
L d,f (h) =px∼d [H (x)!=f (x)]= D ({x:h (x)!=f (x)})
where x is a random sample of X, L d,f (h) is also known as generalized error, loss, or true error of H. L (loss) represents the error. 5 Precautions: The information that the learner can access

Distribution D and Tag function f are unknown to the learner, and learners need to observe the training set. 2.2 Empirical risk minimization

Because the learner does not know D and F, it is not possible to learn the true error directly, only the training error can be calculated:

where [M] = {1,..., m}, starting from predictor H to minimizing LS (h) is called empirical minimization, referred to as ERM. Erm may have been fitted, LS (h) Small does not represent L d,f (h) small.

2.3 Empirical risk minimization considering inductive bias

Fixed ERM the usual solution is to use ERM in a restricted search space, in which the learner should select the set of the Predictor (assuming class H) before reaching the data, and the ERMH learner chooses a h∈h using the ERM rule based on minimizing the probability error on S.

Because this choice is determined by the learner's exposure to the data, it requires a priori knowledge of the learning problem, although the choice of restricted hypothesis classes can prevent the fitting, but it also brings a stronger inductive bias.
One of the simplest limitations for a class is to limit the upper bound of its potential (the number of H in h). In machine learning, it is assumed that the training samples in s are extracted independently from the same distribution in D, but there may still be a training sample that is completely unrepresentative of distribution D, so we will represent the probability of sampling to an unrepresentative sample as Δ, while (1−δ) is called a confidence parameter.
Since it cannot be placed in the absolute accuracy of the label prediction, the introduction of a parameter to evaluate the predictive quality, called the Precision parameter, is recorded as ε, if L d,f (HS) <=ε, we believe that an approximate correct prediction has been obtained.

Misleading set :

Summary: For large enough m, the finite hypothetical class generated by the ERMH rule will approximate the probability (confidence is 1−δ) approximation (the error upper bound is ε) correct.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep understanding of machine learning: from principle to algorithmic learning notes-1th Week 02 Easy Entry __ Machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep understanding of machine learning: from principle to algorithmic learning notes-1th Week 02 Easy Entry __ Machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support