Python data analysis is implemented, including single sample body test, independent sample body test, correlation analysis, and join table analysis !, Python Data Analysis

Source: Internet
Author: User

Python data analysis is implemented, including single sample body test, independent sample body test, correlation analysis, and join table analysis !, Python Data Analysis

1. Hypothesis Test

Make a hypothesis and verify it.

Confidence level needs to be set, such as 95%

Two types of errors:

Two types of errors are probabilities.

 

The original hypothesis is generally equations.

Influence of sample size:

Step: hypothesis-confidence level-data collection-P value calculation

T-test

 

Rejected domain and accepted domain.

One sample t-test, no data, no data for this course. Sorry, I will find the data again later!

 

Two Variables

Does the average monthly expenditure vary between boys and girls?

Is the variance equal? F test!

Start, t statistic!

Data description to filter variables

 

 

 

 

 

Variance analysis

Does education differ in credit card spending?

 

Description of total variations:

Intra-group variation:

 

Inter-group variation:

 

Self-understanding: Sum of squares of total variations (mean of a single sample-mean of the population)

Intra-group variation: Sum of squares of this group (sample value-mean of the group where the sample is located) + sum of squares of the other group (sample value-mean of the group where the sample is located)

Inter-group variation: (average of each group-average of the population) sum of squares

I can understand. I don't know if others can understand it. It's easy to see instances!

F Statistic

 

Requirements.

The data requirement is based on one column and one column. Therefore, create the data and use the F_onewasy () function to implement it! The following value is the P value.

This is the result that can be obtained through statsmodels.

 

Multi-factor variance analysis

R side

Make a linear regression:

 

 

Add interaction items

 

Two consecutive variables!

 

Related Analysis:

Scatter Plot: to see if it is linear. Related. Let's take a look!

Description of correlation coefficient, which can be pearson at most.

 

Correlation Coefficient Calculation

Correlation Coefficient and Correlation

Correlation coefficient test

Look at the Code:

Scatter chart

Calculate Correlation Coefficient

 

Generally, the reliability of correlation coefficient is not used, as long as the sample size is sufficient.

Comparison of classification and Classification

Classification variable correlation

Join Table Analysis

Whether the manager's temper is related to the weather: None

 

 

Related:

Analyzes the relationship between breach and bankruptcy. The line profile is a comparison column.

If the Column profile is a comparison row, if the difference is not big, it indicates that the impact is not big. This is a rough analysis. Not rigorous!

 

Use a slightly rigorous method:

Hypothesis Test

Chi-square test 1:

Chi-square test 2:

Python implementation:

This is the frequency table!

The data is a cross table:

 

 

The above are all distributed in a positive manner .......

Note: Non-normality can be considered as a positive distribution.

: The sample size cannot be too small, but it cannot be too large. N exists in the upper and lower cases of this formula. If a large number of samples is used, the change of the statistic T will be affected!

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.