Statistical methods
There are some statistical methods for pandas objects. Most of them are reduction and summary statistics, used to extract a single value from a series, or to extract a series from a DataFrame row or column.
For example DataFrame.mean(axis=0,skipna=True) , when an NA value exists in a dataset, these values are simply skipped, unless the entire slice (row or column) is all Na, and if you don't want to, you can skipna=False disable this feature by:
?
| 123456789101112131415161718192021222324 |
>>> df one twoa 1.40NaNb 7.10 -4.5c NaN NaNd 0.75 -1.3[4 rows x 2 columns]>>> df.mean()one 3.083333two -2.900000dtype: float64>>> df.mean(axis=1)a 1.400b 1.300c NaNd -0.275dtype: float64>>> df.mean(axis=1,skipna=False)a NaNb 1.300c NaNd -0.275dtype: float64 |
Other commonly used statistical methods are:
| ######################## |
****************************************** |
| Count |
Number of non-NA values |
| Describe |
Calculate summary statistics for columns of series or DF |
| Min, max |
Minimum value and maximum value |
| Argmin, Argmax |
Index position (integer) of minimum and maximum values |
| Idxmin, Idxmax |
Index values for minimum and maximum values |
| Quantile |
Sample sub-positions (0 to 1) |
| Sum |
Sum |
| Mean |
Mean value |
| Median |
Number of Median |
| Mad |
Average absolute deviation based on mean value |
| Var |
Variance |
| Std |
Standard deviation |
| Skew |
The skewness of the sample value (third-order moment) |
| Kurt |
Kurtosis of sample values (four-order moment) |
| Cumsum |
The cumulative sum of the sample values |
| Cummin, Cummax |
Cumulative maximum and cumulative minimum values for sample values |
| Cumprod |
Cumulative product of sample values |
| Diff |
Calculate first-order difference (useful for time series) |
| Pct_change |
Calculate percent Change |
Pandas common statistical methods