Probability and Statistics--correlations&covariance

Last Update:2016-04-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Skew (skewness)

In probability theory and statistics, skewness measures the asymmetry of the probability distribution of a real random variable. The value of skewness can be positive, negative or even undefined. In quantity, a negative bias (negative bias) means that the tail of the left side of the probability density function is longer than the right, and the vast majority of values (including the median) are on the right side of the average . A positive (positive) skewness means that the tail on the right side of the probability density function is longer than the left, and the vast majority of values (but not necessarily the median) are on the left side of the average. A zero skewness means that the values are evenly distributed on both sides of the mean, but not necessarily the symmetrical distribution.

importas pltplt.hist(test_scores_negative)plt.show()plt.hist(test_scores_positive)plt.show()plt.hist(test_scores_normal)plt.show()fromimport skewnegative_skew = skew(test_scores_negative)positive_skew = skew(test_scores_positive)no_skew = skew(test_scores_normal)‘‘‘-0.6093247474592194 0.5376950498203763 0.0223645171350847‘‘‘

The figure depicts a histogram of three data, representing the distribution of the data, and it can be found that the data of the first graph is mostly centered on the right side of the average (negative bias), and the center of the graph is centered on the left side of the mean (positive bias), and the last graph is centered around the mean value.

Kurtosis (peak degree)

In statistics, kurtosis (Kurtosis) measures the peak state of the probability distribution of a real random variable. Higher kurtosis means that the increase in variance is caused by an extreme difference of low frequency greater than or less than the average value.
Kurtosis (Kurtosis) and skewness (skewness) are the two indexes of the normal distribution characteristics of the measured data. Kurtosis measures the flatness of the data distribution (flatness). The tail large data distribution, its kurtosis value is large. A normal distribution has a kurtosis value of 3. Symmetry of biased measurements. 0 description is the most perfect symmetry, the normal distribution of the bias is 0.

Kurtosis its formula is as follows:

The formula for the partial state is as follows:

kurt_platy = kurtosis(test_scores_platy)

Modality (modality)

Modality refers to the number of modes, or peaks, in a distribution. Real-world data often is unimodal (with only one mode).

Import Matplotlib.pyplot as plt# This plot have one mode, making it Unimodalplt.hist (Test_scores_uni) plt.show () # This plot Has peaks, and  isbimodal# this could happenifOneGroup  ofStudents learned the material, andOne learned somethingElse, forExample.plt.hist (TEST_SCORES_BI) plt.show () # More than one peak means that the plot ismultimodal# We can' tEasily measure the modality ofA plot, like we can withKurtosisorskew.# Often, the best toDetect Multimodality is  toObserve the Plot.plt.hist (Test_scores_multi) plt.show ()

Mean (mean value)

import matplotlib.pyplot as pltplt.hist(test_scores_normal)# The axvline function will plot a vertical line over an existing plotplt.axvline(test_scores_normal.mean())plt.show()plt.hist(test_scores_negative)plt.axvline(test_scores_negative.mean())plt.show()plt.hist(test_scores_positive)plt.axvline(test_scores_positive.mean())plt.show()

Median (median)

Simultaneous display of median and mean values

Import Numpyimport Matplotlib. PyplotAs Plt# Plot the histogramPlt. hist(test_scores_negative)# Compute The medianMedian = NumPy. Median(test_scores_negative)# Plot The median in blue (the color argument of "B" means blue)Plt. Axvline(Median, color="B")# Plot the mean in redPlt. Axvline(test_scores_negative. Mean(), color="R")# How does the median is further to the right than the mean?# It's less sensitive to outliers, and isn ' t pulled to the left.Plt. Show() PLT. hist(test_scores_positive) PLT. Axvline(NumPy. Median(test_scores_positive), color="B") PLT. Axvline(test_scores_positive. Mean(), color="R") PLT. Show()

The following statistical analysis is based on the NBA data set, the approximate format is as follows

Player,pos,age,bref_team_id,g,gs,mp,fg,fga,fg.,x3p,x3pa,x3p.,x2p,x2pa,x2p.,efg.,ft,fta,ft.,orb,drb,trb,ast,stl , Blk,tov,pf,pts,season,season_end
[Quincy,acy,sf,23,tot, 63,0,847,66,141,0.468,4,15,0.266666666666667,62,126,0.492063492063492,0.482,35,53,0.66,72,144,216,28,23,26,30,122,171,201 3-2014,2013]
[Steven,adams,c,20,okc,81,20,1197,93,185,0.503,0,0,na, 93,185,0.502702702702703,0.503,79,136,0.581,142,190,332,43,40,57,71,203,265,2013-2014,2013]

player –name of the player.
pts , Haven total number of points the player scored in the season.
AST , Haven total number of assists the player had in the season.
FG. , Haven Player ' s field goal percentage for the season.

Calculating Standard Deviation

Q In fact the STD () function can calculate the standard deviation

# The nba stats are loaded into the nba_stats variable.def calc_column_deviation(column):    mean = column.mean()    0    forin column:        difference = p - mean        2        variance += square_difference    variance = variance / len(column)    return variance ** (1/2)mp_dev = calc_column_deviation(nba_stats["mp"])ast_dev = calc_column_deviation(nba_stats["ast"])

Normal distribution

Norm.pdf can generate a set of data with normal distribution data, giving each data the corresponding probability to satisfy a given mean variance.

Import NumPy as Npimport Matplotlib.pyplot as plt# the norm module have a PDF function (PDF stands for probability density function) from scipy.stats import norm# the Arange function generates a numpy vector# the vector below wouldStart  at-1, and GoUp to, but notIncluding1# It'll proceedinch "Steps"  of .. So the FirstElement would be-1, theSecond-., the third-. 98, AllThe UP to .. Points = Np.arange (-1,1,0.01) # The Norm.pdf function would take points vector andTurn it intoA probability vector# eachElementinchThe vector would correspond toThe normal distribution (earlier elements andLater element smaller, peakinchThe center) # The distribution'll be centered on 0, andWould has a standard devation of . 3probabilities = norm.pdf (points,0,. 3) # Plot the pointsValues  onThe X axis andThecorrespondingProbabilities onThe Y axis# See the Bell Curve?plt.plot (points, probabilities) plt.Show() points = Np.arange (-Ten,Ten,0.1) probabilities = norm.pdf (points,0,2) Plt.plot (points, probabilities) plt.Show()

Covariance (covariance)

# the nba_stats variable has been loaded.  def  covariance  Span class= "Hljs-params" > (x, y) : x_mean = SUM (x)/len (x) Y_mean = SUM (y)/len (y) x_diffs = [I-x _mean for  i in  x] y_diffs = [I-y_mean for  i in  y] codeviates = [x_diffs[i] * Y_diffs[i] for  i in  range (len (x))] return< /span> sum (codeviates)/len (codeviates) COV_STL_PF = covariance (Nba_stats[ "STL" ], Nba_stats[ "PF" ]) cov_fta_pts = covariance (Nba_stats[ " FTA "], Nba_stats[])

Correlations

The most common measure of relevance is Pearson S R, also called R-value.

fromimport pearsonrr, p_value = pearsonr(nba_stats["fga"], nba_stats["pts"])# As we can see, this is a very high positive r value -- close to 1print(r)r_fta_pts, p_value = pearsonr(nba_stats["fta"], nba_stats["pts"])r_stl_pf, p_value = pearsonr(nba_stats["stl"], nba_stats["pf"])‘‘‘0.369861731248‘‘‘

The formula for correlation is as follows:

From NumPy import cov# the NBA_statsVariable has been loaded in.r_fta_blk= CoV (NBA_stats["FTA"], NBA_stats["Blk"])[0,1]/((NBA_stats["FTA"].var () * NBA_stats["Blk"].var ()) * * (1/2)) R_ast_stl = CoV (NBA_stats["AST"], NBA_stats["STL"])[0,1]/((NBA_stats["AST"].var () * NBA_stats["STL"].var ()) * * (1/2))

Probability and Statistics--correlations&covariance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Probability and Statistics--correlations&covariance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Probability and Statistics--correlations&covariance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support