Data types and data distribution

Source: Internet
Author: User

1. Dissociation data and discrete distributionDissociation data is usually data that can only be represented by integers. For example, the number of people in a province, the number of planets in a unit volume in the universe, etc. 1.1 The discrete distributions of discrete data are commonly described in statistics:1. Degraded distribution:A random variable x takes a certain constant in probability 1, i.e. p{x=a}=1, which is called X obeys the degenerate distribution at a. Determine the distribution.2. Two-point distribution:A random variable has only two possible values, set its distribution to P{X=X1}=P,P{x=x2}=1 -p, 0<p <1, x obey x1< Span lang= "en-US" >, x 2 parameter is p     When youX take only0,12 values with a probability distribution of p{X=1}=P,P{X=0}=1-p, 0<p< Span lang= "en-US" ><1. It is called x obey the parameters of p 0- 1 distribution ,  also known as x is a parameter =p, dx=p (1- p). "Toss a coin"                                           uniform distribution on 3.N points:The random variable x takes n no different value, and its probability distribution is P{x=XI}=1/n, (i=1,2,3,..., N), then X is said to obey the uniform distribution of n points {x1,x2,..., xn}. "Throw a dice." This kind of distribution is often found in classical approximate patterns.4. two distributions : N-heavy Bernoulli test, probability distribution of successful K-times.     "The key to judging whether or not the Bernoulli test is the probability of each test event A is the same, and the result of each test is not related to the results of the other tests, which means that the test is a series of tests, not a single test, but multiple times, but pay attention to the probability that the recurrence of the occurrence of the event has no effect. 】5. Geometric distribution: N timesBernoulliin the experiment, A is an event, set X to the number of experiments performed until event a occurs. "In the bag until you touch the number of a red ball."6. hypergeometric Distribution:A bag of the party fittedn balls, where n1 white balls, n2 black balls, from which to extract n balls, X for the number of white balls, then the distribution of x is 7. Poisson distribution:

The telephone Exchange station receives the user's call number at a given time, the number of customers who arrive at the ticket gate, the number of times the insurance company has been claimed in a given period, how many people arrived at a service facility within a certain time, the number of passengers on the bus platform, the number of faults in the machine, the number of natural disasters, The number of defects on a piece of product, the number of bacterial distributions in the unit partition under the microscope can be described approximately by Poisson distribution. when an event is randomly and independently occurring at a fixed probability λ "average instantaneous rate λ (or density)", then the number or number of occurrences within the unit time (area or volume) of the event is closest to the Poisson distribution. Poisson distribution is a condition of two distributions, which is deduced from the asymptotic infinity of the middle N.  For an understanding of the Poisson distribution, see Nanyi's understanding of Poisson distribution.

1.2 The linkage between discrete distributions    

The two-item distributions, geometric distributions and Pascal distributions (negative two-item distributions) are all based on independent Bernoulli tests.


Two distribution: Describes the probability geometric distribution of success X times in a given n-th experiment: Describes the probability of the first successful occurrence in the x-th distribution: A positive integer form of a negative two-item distribution, which describes the probability that each occurrence of an event in the Bernoulli test is P, in a series of Bernoulli tests, One event happens to be the probability of the first R time in the R + K test, so the geometric distribution is a special case of the Pascal distribution of n=1.

Hypergeometric distribution: Describes a limited total no back-up sampling problem。 Overall there are n individuals, wherein a certain characteristic of the individual has m, if extracted from the N, wherein the sample with this characteristic is the probability of x. In hypergeometric distributions we often want to infer N (known as M) or m (known as N). For example, to know how many fish in the river, you can salvage M-mark, after some time that the marked fish are evenly dispersed in the water, and then salvage N, which has labeled fish as M, inferred total number of fish N.

hypergeometric distribution V.s. Two distributions: both are sampled, except that the hypergeometric distribution is non-return sampling, and two distributions have a back-sampling. When n is very large in the hypergeometric distribution, and N is very small, no back sampling can be approximated as a back-up sample, that is, the hypergeometric distribution can be approximated by a two-item distribution.

Poisson distribution V.s. Two distributions:

Poisson distribution can be used to approximate two distribution, when the two distribution, n is very large, and p is very small, NP is a size appropriate number, you can use Poisson (NP) to approximate two distributions. Binomial (x;n,p) =poisson (X,NP)

For example, a city of 100,000 people, within one hours, each person to a station of the probability of 0.001, then within one hours, the station will be how many people come? This is a two-item distribution, n=10 million, p=0.001, apparently expecting equal to np=100 people. If the probability of the arrival of 150 people within one hours, of course, can be used two distribution, but the number of combinations in the calculation is not good, then you can use Poisson distribution approximation: In one hours, the arrival of the number of the station subject to lambda=np=100 Poisson distribution. that is, Poisson distribution is often used to describe a large population, and the probability of events occurring is very small for each individual in the population .(But the probability of an event occurring in the overall =np, is not a small number), the probability that the number of occurrences in the population is x over a period of time.     The number of occurrences is obviously related to the length of time and LAMBDA=NP. If x is subject to Poisson distribution, then x should meet the three conditions of the Poisson process: smoothness, independence, and general nature. (Basis of probability theory, Fudan University, Li Xianping, 99th page)

The so-called smoothness is in a period of time occurred in relation to the starting point of the timing, only with the length of time;

The so-called independence is the mutual independence of the process in the time interval of disjoint;

The so-called general sex is the same time can not have two or more than two events occurred.

Obviously, these three points may not be satisfied in reality.     For example, the number of calls coming over a period of time, it is possible to have two calls at the same time (busy), it may also be uneven, such as the number of calls in the daytime more than the night. The geometric distribution has No memory, since each trial is independent and unaffected by previous test results. Notice that in the continuous distribution exponential distribution also has no memory

2. Continuous data and continuous distribution

    Continuous data is in a certain interval can be arbitrary value of the data, its value is continuous, the adjacent two values can be infinitely divided after the still meaningful, that is, an infinite number of values.the most mentioned in the statistics is the normal distribution. It's important! 2.1 Common types of continuous distributions in statistics:1. Evenly distributed 2. Normal distribution-standard normal distribution 3. χ2 (Chi-square) Distribution 4.F distribution 5.T distribution 6. Exponential distribution--notice the difference from the Power law distribution 7.γ (gamma) Distribution 8.weibull distribution 9.β (Beta) distribution

2.2 Connection between continuous distributions

The normal distribution is the core of the statistics. According to the law of large numbers and the central limit theorem, the distribution of two distributions and Poisson's distribution can be approximated to normal distribution when n is approaching infinity. continuous type distribution,χ2 (Chi-square) distributions, T distributions, and F distributions are derived from the normal distribution (standard normal distribution).the exponential distribution and the power-law distribution pattern are very similar, http://blog.sina.com.cn/s/blog_8f48f45301015ofs.htmlIt is pointed out that the power law distribution is faster than exponential distribution, the power law decreases at both ends higher, the middle is lower, and the first half is faster than the exponent.

Data types and data distribution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.