The problem of machine learning----distribution (two yuan, multivariate variable distribution, beta,dir)

Last Update:2015-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This involves a mathematical probability problem.

two meta variable distribution:

Bernoulli distribution, is 0-1 distribution (such as a coin toss, face up probability)

Then the probability distribution of a coin toss is as follows:

Suppose the training data is as follows:

So, based on maximum likelihood estimation (MLE), we require u:

The evaluation derivation process is as follows:

So we can find out:

The above derivation process is the maximum likelihood estimate, we can see that you are the sample frequency divided by the total number of coin toss experiment. But the maximum likelihood estimate has its limitation, when the training sample is relatively small can cause the overfitting question, for example throws 10 times the coin, has 8 times upwards, then according to the maximum likelihood estimate, U's

The value should be 8/10 (the point of view of this symbol frequency faction). How to solve this problem?

At this point, we need to start with Bayesian theory, Bayesian theory that you are not a fixed value, U is also subject to a distribution, so we assume you have a priori distribution P (u).

But how to choose this priori distribution P (u)?

We know

So we want a priori distribution to have a similar probability distribution, why do you say so? Because the posterior probability = a priori probability * likelihood function, if the selected prior distribution and likelihood function have the same structure, then the resulting posterior probability will also have similar structure, which will make the calculation of the later simple.

conjugation: The posterior distribution P (θ|x) of θ belongs to the same distribution as the prior distribution P (θ), so the two are conjugate distributions.

So we assume that the prior distribution of U is also

So this time there's a distribution in mathematics called the Beta distribution:

So let's say we cast a coin, M-times positive, L-second reverse. A total of m+l=n experiments:

Then the distribution of U is:

is still the same distribution as the prior distribution (conjugate distribution)

Suppose we want to predict the next experimental result, that is, given that D gets the next predicted distribution:

We can find that when the m,n is infinitely larger, this estimate is approximately equal to the maximum likelihood estimate.

multivariate variable distribution:

Most of the time, there are more than two elements of the variable, and there are many, in fact, the estimation process is similar. Suppose there is a k-dimensional vector, where one vector xk=1, the other equals 0.

For example, if a variable x2 occurs, the x2=1,x= (0,1,0,0,0,0) is a cast sieve example with a total of 6 faces.

Then the probability of xk=1 occurring is UK, so the distribution of x is:

Consider n independent observed values {x1,x2,... xn}d, corresponding likelihood functions:

Where MK is actually so many experiments, the number of times the UK appears. Estimating maximum likelihood estimates, we will conclude that:

Similarly, to avoid overfitting problems caused by small data volumes, we also assume a priori distribution for the UK:

Considering the distribution of multivariate variables U:

So we chose its conjugate distribution Dirichlet distribution to a prior distribution:

then posterior distribution = likelihood distribution * Prior distribution:

is still the same distribution as the prior distribution (conjugate distribution)

Suppose we want to predict the next experimental result, that is, given that D gets the next predicted distribution:

And because for Dirichlet distribution:

So the distribution prediction for a class is:

The problem of machine learning----distribution (two yuan, multivariate variable distribution, beta,dir)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The problem of machine learning----distribution (two yuan, multivariate variable distribution, beta,dir)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The problem of machine learning----distribution (two yuan, multivariate variable distribution, beta,dir)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support