Small sample Analysis (ii) _ Small sample

Source: Internet
Author: User

After the things are written, confirm a few things, take out the water today. Formula This is really trouble dead, Markdown also bad use, had to word inside the formula are removed, write a little.

Introduction:

A dark box full of balls (think enough), take out a ball to find the red ball, and then take out one or red ball probability is how much. In previous encounters these problems tend to be based on the number of sample samples that are too small to be statistically significant. Refusing to answer such "meaningless" questions, and when these questions have to be answered in natural language processing, there are often more common estimates (as described earlier), or subjective constructs of some modified formulas, Later, it is found that it is good to use, or it is trained by a large number of known data sets to get the probability result of optimizing the objective function, but it lacks the mathematical basis.
Method of estimating: When we do not take out any ball at all, we know nothing about the ball in the box, we can only think that the probability of removing a red ball from the inside is any probability from 0 to 1, in order to satisfy the maximum entropy assumption, the probability density distribution function is evenly distributed, and in order to simplify the problem itself, We simplify the infinite possibility of 11 kinds of possibilities: 0.05 probability to 0 of the probability of taking out the red ball, 0.1 probability to 0.1 of the probability of taking out the red ball, 0.1 probability to 0.2 of the probability of taking out the red ball, ... The probability of 0.05 takes out the red ball with 1 probability. In this way we construct an original hypothesis, which is uniformly distributed and has the largest entropy. When we take out a red ball, we ask: what is the probability distribution of the black box in the above 11 cases? Students who have a little knowledge of Bayesian should be able to calculate. So if you ask: what is the probability of a red ball being removed from the black box? Obviously, the probabilities of each of the 11 cases are known, and the probability of pulling out the red balls in each case is known, and red is easy to calculate. Because the formula is too difficult to make, you can also convert it into a mathematical integral form to deduce it again.
Results observed:
The probability of 25 red balls and 100 samples taken out of 50 red balls in 50 samples is 50%, but the probability distributions of 11 cases are different, and the latter one is more certain for this 50%. The distribution entropy of 11 kinds of cases can be used as our measure of the uncertainty of the final probability (the greater the uncertainty). Even if we take only one red ball, we can give a minimum risk estimate, but we also know that we are uncertain about it.

With the increase of sampling, the uncertainty will decrease (to be verified), and the final probability will be closer to the sample of pure investigation. This is also in line with our life experience.

Advanced--Introducing the original hypothesis with knowledge:

According to life experience: there are so many colors in the world, the red ball must occupy less than half (there is no one red in the table tennis). The probability of adding a prior knowledge red ball is 0.1. There are 1 red balls in every 10 balls in the world. If we've never smoked a ball and are completely ignorant of the black box, then we're going to guess 0.1, but what about the probability distribution of the 11 cases above? First, the last average probability is 0.1, secondly, we have the least risk, that is, the maximum distribution entropy. In these two requirements we can construct a "reasonable" distribution (if you want to ask why the distribution of entropy is too small risk, I do not know where to start, let's say it is right). As for how to deduce how to seek this distribution, you can first look at the book "Maximumentropy Language with non-local dependencies" (WU) the dice of this book, I also see this. Finally, it's the one that turns it into an integral form and deduces it again.

Summary:

In fact, this idea is very simple, if the sample is less, the probability of estimating the use of foreign knowledge, but to borrow knowledge, to reduce risk, consider every possibility can occur. As some schools in quantum mechanics say: Electrons he is a wave, but when you look at him it becomes a point. Every possibility in the world has happened to him, but what we see is the macroscopic state.

Simplification--Feature degradation methods:

Back to the example above, we use entropy and other things mess up a distribution, may not be very intuitive, and computational trouble (if the use of numerical methods, we must pay attention to precision loss, if the probability of red ball is only 0.01, then you should at least one situation is less than 0.01, 0 except). The so-called feature degradation is to erase the personality, although we ask: the black box inside the probability of the red ball is how much. Let's step back and ask: what is the probability of a red ball inside the box? We can count all the boxes in the world, and we can just count the boxes that resemble the box, but we have a lot of (n) known boxes, and these boxes can be opened. We simply think that the probability of the inside of the black box has a 1/n probability and every box is the same, which we calculate is an average probability. So what is the characteristic of this black box? is to take a ball out of a red. Once again, according to the theory of the Lord, redistribution of the black box is the probability of each box. Finally give the expectation of taking out the red ball.

Give me a chestnut:

Before someone asked 30 samples, how to calculate the probability. Here's an example, but let's talk nonsense: no matter how you cow, play more tricks, the nature is uncertain things he is uncertain, we can change is to take less risk. I use natural language processing as an example to count the probability of "method" appearing after the word "degenerate". The "degeneration" occurred 30 times, and the "method" appeared 2 times. First we degenerate the term "degenerate" into a word, the probability of a "method" after which all the words (we can select words that appear more than 300 times) are counted. The same approach assumes that "degenerate" has the same probability of every word, and then the optimization of "degeneration" is the probability of every word. Finally, an average of all possibilities is given.

Practice:

That's what I did in the Bayes text category that I used to do (there are 9 classes in this category, of the 2000 posts in each category, only 1 of the words in the voting process have been removed without comment, because they are too much, memory is not fit, and methods of entropy and degenerate features have been done, For the forecast set effect is about 85.5%, but the entropy is slightly better, for the training set seems to have more than 98%. Bayesian is a statistical model, I can not arbitrarily say that the fitting (after all, Bayes did not fit anything), perhaps just the training set and test set two sample sets have a certain difference, knowledge is not fully applicable.


(seemingly not on the formula, easier to say clearly.) )

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.