"Data analysis/mining bottom-up algorithm" original implementation of two-item distribution algorithm and application

Source: Internet
Author: User
Tags pow

7.2 Two-Item distribution algorithm

Author Bai Ningsu

August 15, 2015 22:51:38

Abstract: In this paper, the study of the geometric distribution of statistics, two distribution, Poisson distribution in depth, based on a variety of distribution of basic concepts and core knowledge introduced. For the realization of various distributions and the application of the real environment is the purpose. In conducting a series of independent experiments, each has both success and failure, and the probability of success in a single experiment is equal. The number of successes in a series of trials. This scenario applies to this algorithm . In this algorithm, the N-Times Bernoulli test: The probability of R success, the expectation of two-item distribution and the realization of two-item distribution variance are obtained by N-Times.

Catalogue

The application of the discrete probability distribution of statistics

Geometric distribution of statistics, distribution of two items and Poisson distribution

Geometric distribution algorithm of statistics

The chi-square distribution of statistics

The basic description of the 7.2.1 algorithm includes: definition, symbolic interpretation, and concrete calculation method. 7.2.1.1 Algorithm Description

In conducting a series of independent experiments, each has both success and failure, and the probability of success in a single experiment is equal. The number of successes in a series of trials. This scenario applies to this algorithm.

In this algorithm, the N-Times Bernoulli test: The probability of R success, the expectation of two-item distribution and the realization of two-item distribution variance are obtained by N-Times.

7.2.1.2 definition

In each independent event, the probability of the answer to each question is P, and the probability of error is Q. The probability of answering the r question in the n question is: This type of problem is called a two-item distribution. Expression: X~b (n,p)

7.2.1.3 symbol Interpretation

N: Indicates the total number of trials

X: Indicates the number of successful times of n Independent trials
P: Probability of success of independent test
Q: The probability that the independent test failed

7.2.1.4 calculation method

Assuming that the probability of success is p, the probability of failure is q,n in the Bernoulli test, and the test number is R:

1. For the first time, the R test succeeds:

2. Expect:

3. Variance:

The application scenario of the 7.2.2 algorithm. Specifically, in the book or in the study of the actual scene as an example, describes the use of the algorithm, including: In this scenario, the definition of the algorithm, the symbols contained in the specific meaning of the interpretation, as well as the specific calculation method. 7.2.2.1 algorithm description under this scenario

Case Description: A quiz game a total of 5 questions, each answer the probability of 0.25, assuming that each time the answer is an independent event.

7.2.2.2 the algorithm definition under this scenario

Case definition: The single independent success probability of the answer is 0.25, the probability of failure is 0.75, know a total of 5 questions, you can find the relevant number of correct answers.

7.2.2.3 the symbolic interpretation of the algorithm in this scenario

N: Indicates the total number of trials, n=5.

X: Indicates the number of times that n independent trials succeeded, X=r (R).
P: Indicates the probability of success of the independent test, p=0.25.
Q: The probability that the independent test fails, q=1-p=0.75.

7.2.2.4 algorithm calculation method under this scenario

Case calculation: Assuming that the probability of success is p=0.25, the probability of failure for the q=1-p,n-time Bernoulli test, a total of 5 topics under the premise of satisfying:

1. The probability of two questions correctly answered:

2. The probability of a wrong answer:

4. Expect:

5. Variance:

Description of the advantages and disadvantages of the 7.2.3 algorithm, as well as the description of the scenario and the scenarios that the algorithm applies to, and the data type description that the algorithm applies to. 7.2.3.1 Advantages of this algorithm

Advantages: When the number of trials is certain, when the number of success, the geometric distribution is not suitable for the situation, given the two distribution of such problems can be better solved.

7.2.3.2 Disadvantages of this algorithm

Disadvantage: But the face of the test times is not fixed, the occurrence of event probability, it is obvious that the geometric distribution and two distribution can not be solved, here also reflects the advantages of Poisson distribution.

7.2.3.3 This algorithm adapts to the scene

N tests are carried out under the same conditions, the results of each observation unit are independent and can only have a mutually opposing result, in order to determine how many times the nth test succeeds.

7.2.3.4 This algorithm does not adapt to the scene

In the case of a non-independent test, or an independent test, the probability of first success is not applicable to two distributions.

7.2.3.5 the data types that the algorithm applies To

This algorithm is suitable for double data type, it retains two decimal places by default, and can set the number of reserved bits by itself.

the input data, the intermediate result and the graphical presentation method of the output result of the 7.2.4 algorithm. 7.2.4.1 This algorithm input data

* @param n int, representing the total independent event

* @param r int, indicating the occurrence of R times

* @param P double, indicating the probability of success

7.2.4.2 Intermediate results of this algorithm

Methods in Rsucess:

* @param Q Double, indicating the probability that the failure is 1-p

* @param NCR Double, indicating the number of successful combinations

7.2.4.3 output of this algorithm

* @return PX double type reserved two decimal places, need n test r probability of success

* @return EX Double type retains two decimal places, two items of expected distribution

* @return VX Double type retains two decimal places, two variance of the distribution

Graphical display of the 7.2.4.4 algorithm7.2.5 can result in an abnormal state description of the algorithm calculation, as well as a workable workaround. This section describes, on the one hand, situations in which the calculation results may be incorrect or error-generating in the specific calculation process, and on the other hand describes what type of exception is caused in the program. 7.2.5.1 This algorithm may be abnormal or error

Exception 1: Input data is illegal, such as: required to enter a double data, enter the letter.

Exception 2: Input data is particularly large, beyond computational power

Error 1: Calculation of the combined result data retention bits, intercept characters, there is a certain slight error

Error 2: Inaccurate retention of decimal digits

7.2.5.2 This algorithm exception or error handling

Exception 1: Fix, input not legal to give hint.

Exception 2: Resolve, exception capture

Error 1: Solve, as far as possible intermediate calculation process does not preserve the decimal place, reduce the error effect

Error 2: Resolved, the number of decimal places to customize the retention package, based on the specific accuracy of the settings.

the code reference for the 7.2.6 algorithm. This point simply gives a basic description of the corresponding class and method, giving the class name and listing the method of the specific invocation. 7.2.6.1 class and method basic description

Class Source: See source program: distributes. Src. Disttools. Binodist

In the N-Times Bernoulli test, the probability of N-Times R-Times success, the expectation of two-item distribution, and the realization of two-item distribution variance are realized in this algorithm.

7.2.6.2 class and method invocation interfaces

See source program: distributes. Src. Disttools. Binodist

The following methods are included in the Binodist.java:

rsuccess (int n,int r,double p,double q)//test n Independent test, probability of successful R times

Expectation (int n,double p)//Two-item distribution expectations

Variance (int n,double p,double q)//two-item distribution variance

Call encapsulation Method:

Numformat.java the following methods:

Decformat (int n,double num)//self-setting for num value reserved number of bits n

Factorial (int num)//factorial for NUM values

Two item distribution class code implementation and related detailed comments

The comments for the class are as follows:

Package disttools;/** * * @ (#) Geodist.java * @Description: Description: This algorithm in the N-time Bernoulli test: The test n times to get the probability of R success, the two-item distribution of expectations, two of the implementation of the distribution variance. * @Definitions: Definition: In mutual independent events, the probability of the answer to each question is P, the wrong probability is Q. The probability of answering the r question in the n question is: P (x=r) =c_n_r*p^r*q^ (n-r) is called a two-item distribution. The expression is: x~b (n,p) * @Explanation: Symbol Interpretation: N: Indicates the total number of trials; X: Indicates the number of times that n independent trials were successful; P: the probability of success of an independent test; Q: The probability of failure of an independent test * @Comments: Conditions: In a series of independent experiments, each has a success, and the possibility of failure, and the probability of success of a single experiment is equal. The number of successes in a series of trials. This scenario applies to this algorithm. * Advantages: In the test number of times, when the number of success, the geometric distribution is not suitable for the situation, given such problems two distribution can better solve * @ disadvantage: But the face of the test times are not fixed, the occurrence of event probability, it is obvious that the geometric distribution and two distribution can not be solved, here also reflects the advantages of Poisson distribution. * @ applicable scenario: N times the test is carried out under the same conditions, the results of each observation unit are independent, and can only have a contradictory result, the purpose: the number of times of the nth test success. * @ Not applicable scenario: in the case of a non-independent test, or independent test to obtain the first chance of success, not two distribution. * @ input/output parameters: See specific method * @ Exception/ERROR: * Exception 1: Input data is not legal, such as: Require the input double data, enter the letter.          * Exception 2: The input data is very large, beyond the computational power * ERROR 1: Calculate the combined result data retention bits, intercept characters, there is a certain slight error * ERROR 2: Retain the number of decimal place caused by inaccurate * Resolution: * Exception 1: Input is not legal to give hints. * Exception 2: Exception capture * ERROR 1: As far as possible the intermediate calculation process does not preserve the decimal place, reduces the error influence. * ERROR 2: The number of decimal places to customize the retention package, according to the specific precision settings. * @Create date:2015 August 3 20:29:13 * @since JDK1.6 S * @author Bai ninGchao * * 

Method One: The method annotation of the first probability of success and its implementation

Comments:

 /**
* In N-times Bernoulli trials, the probability of r times occurring in N independent Bernoulli trials is
* P (X=R) =c_n_r*p^{r}*q^{n-r} and c_n_r=n!/(r!* (n-r)!)
* @param n int, representing the total independent event
* @param r int, indicating the occurrence of R times
* @param p double type retains one decimal place, indicating the probability of success
* @return PX double type reserved Two decimal places, 5 questions the probability of correct two questions
*/

Code:


public static double rsucess (int n,int r,double p) {double px=0;double q=1-p; The probability of failure is 1-ptry{//according to c_n_r=n!/r!* (N-R)! The combination of R times in n events double ncr=numformat.factorial (n)/(Numformat.factorial (R) * Numformat.factorial (N-r)); px=ncr* (Math.pow (P, R)) * (Math.pow (q, (double) (n-r)));//probability of solving two distributions based on formula P (x=r) =c_n_r*p^{r}*q^{n-r} px= Numformat.decformat (4,PX); System.out.println (">> 5 answers to the probability of two problems correctly:" + PX); }catch (Exception e) {System.out.println (">> error message Description:" +e.getmessage ());} return PX;}

Method 2:2 The expected method annotation of the distribution and implementation

Comments:

/**
* In the N-Times Bernoulli test, the expected formula for two distributions: E (X) =NP
* @param n int type, indicating the number of trials
* @param p double type retains one decimal place, indicating the probability of success
* @return EX Double type retains two decimal places, two items of expected distribution
*/

Code:

public static double expectation (int n,double p) {double ex=0;try{ex= double.valueof (n) *p;//is expected based on the expected Formula E (X) =NP of the two-item distribution ex= Numformat.decformat (4,EX);  System.out.println (">> two-item distribution expectation:" + EX); }catch (Exception e) {System.out.println (">> error message Description:" +e.getmessage ());} return EX;}

Method 3:2 The Variance method annotation of the item distribution and the implementation

Comments:

/**
* In the N-Times Bernoulli test, the variance formula for the two-item distribution: Var (X) =NPQ
* @param n int type, indicating the number of trials
* @param p double type retains one decimal place, indicating the probability of success
* @param q Double type retains one decimal place, indicating the probability of failure that is 1-p
* @return VX Double type retains two decimal places, two variance of the distribution
*/

Code:

public static double Variance (int n,double p) {double vx=0;double q=1-p;  The probability of failure is 1-ptry{vx= double.valueof (n) *p*q;//the variance formula var (X) =npq for the two-item distribution Vx=numformat.decformat (4,VX);  System.out.println (">> two variance of the distribution:" + VX); }catch (Exception e) {System.out.println (">> error message Description:" +e.getmessage ());} return VX;}
Main function:

public static void Main (string[] args) throws Exception {      //px is the probability of returning a 5-question answer to two questions       0.2637            binodist.rsucess (5, 2 , 0.25);   Parameter 1: Total number of questions N, Parameter 2: Answer the number r, Parameter 3: The probability of success of the independent event P          //ex is the two distribution of expectations 1.25      binodist.expectation (5, 0.25);  Argument 1: The probability of the success of the independent event P           //vx is the variance of the two-item distribution  0.9375      binodist.variance (5, 0.25);   Argument 1: Probability of success for independent events p}

Note: PPT Download (extract code: 4849)

"Data analysis/mining bottom-up algorithm" original implementation of two-item distribution algorithm and application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.