Python data analysis is implemented, including single sample body test, independent sample body test, correlation analysis, and join table analysis !, Python Data Analysis
1. Hypothesis Test
Make a hypothesis and verify it.
Confidence level needs to be set, such as 95%
Two types of errors:
Two types of errors are probabilities.
The original hypothesis is generally equations.
Influence of sample size:
Step: hypothesis-confidence level-data collection-P value calculation
T-test
Rejected domain and accepted domain.
One sample t-test, no data, no data for this course. Sorry, I will find the data again later!
Two Variables
Does the average monthly expenditure vary between boys and girls?
Is the variance equal? F test!
Start, t statistic!
Data description to filter variables
Variance analysis
Does education differ in credit card spending?
Description of total variations:
Intra-group variation:
Inter-group variation:
Self-understanding: Sum of squares of total variations (mean of a single sample-mean of the population)
Intra-group variation: Sum of squares of this group (sample value-mean of the group where the sample is located) + sum of squares of the other group (sample value-mean of the group where the sample is located)
Inter-group variation: (average of each group-average of the population) sum of squares
I can understand. I don't know if others can understand it. It's easy to see instances!
F Statistic
Requirements.
The data requirement is based on one column and one column. Therefore, create the data and use the F_onewasy () function to implement it! The following value is the P value.
This is the result that can be obtained through statsmodels.
Multi-factor variance analysis
R side
Make a linear regression:
Add interaction items
Two consecutive variables!
Related Analysis:
Scatter Plot: to see if it is linear. Related. Let's take a look!
Description of correlation coefficient, which can be pearson at most.
Correlation Coefficient Calculation
Correlation Coefficient and Correlation
Correlation coefficient test
Look at the Code:
Scatter chart
Calculate Correlation Coefficient
Generally, the reliability of correlation coefficient is not used, as long as the sample size is sufficient.
Comparison of classification and Classification
Classification variable correlation
Join Table Analysis
Whether the manager's temper is related to the weather: None
Related:
Analyzes the relationship between breach and bankruptcy. The line profile is a comparison column.
If the Column profile is a comparison row, if the difference is not big, it indicates that the impact is not big. This is a rough analysis. Not rigorous!
Use a slightly rigorous method:
Hypothesis Test
Chi-square test 1:
Chi-square test 2:
Python implementation:
This is the frequency table!
The data is a cross table:
The above are all distributed in a positive manner .......
Note: Non-normality can be considered as a positive distribution.
: The sample size cannot be too small, but it cannot be too large. N exists in the upper and lower cases of this formula. If a large number of samples is used, the change of the statistic T will be affected!