[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution
Recently, I got some level-4 Score data, about 500 or more. So I wondered if these Score data could meet the so-called normal distribution? Let's do what we do, so we have this article.
This article introduces some usage of the xlrd module and some methods of drawing bar charts and random bar charts of Custom Data Using matplotlib, and provides some links, it can be used as a resource for learning matplotlib and numpy. It is also helpful to readers.
For more information, see
Tools
- Python 1, 3.5
- Xlrd Module
- Numpy module and some dependent modules (check the installation method by yourself, and most pip can do it)
- Matplotlib plotting Module
Xlrd basic usage 1. Import Module
2. Open an Excel file to read data
1
|
data = xlrd.open_workbook('excelFile.xls')
|
3. Tips
Draw a line chart
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
Import xlrd Import numpy as np Import matplotlib. pyplot as plt
Data = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ')
Table = data. sheets () [0] # sheet 0
Col5 = table. col_values (5) [1:] # obtain the score of column 5th and remove the column attribute name.
Count = [0 for I in range (0,650)] # initialize count X = [I for I in range (0,650)]
For I in col5: Num = int (I) Count [num] + = 1 # count the number of people
Plt. xlabel ('score ') Plt. ylabel ('number of people ') Plt. title ('stribution of CET-4 scores ') Plt. ylim (0, 8) Plt. Plut ([I for I in range (250,650) if count [I]! = 0], [I in count [250:] if I! = 0], linewidth = 1) # draw a line chart Plt. show ()
|
Figure 1
Draw a histogram and compare it with a normal distribution histogram
Import xlrdimport numpy as npfrom math import * import pylab as plimport matplotlib. pyplot as pltdata = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ') table = data. sheets () [0] # sheet 0col5 = table. col_values (5) [1:] ha = [int (I) for I in col5] # result data mu = np. mean (ha) # mean sigma = np. std (ha) # standard deviation data = np. random. normal (mu, sigma, 1000) # generate normal Distribution Random Data x = np. linspace (0,700,100 0) y = (1. /sqrt (2 * np. pi)/sigma) * np. exp (-(x-mu) ** 2/(2 * sigma ** 2) plt. hist (data, bins = 100, facecolor = 'G', alpha = 0.44) plt. hist (ha, bins = 70, facecolor = 'R', histtype = 'stepfiled') plt. plot (x, y, color = 'B') # normal distribution curve plt. xlabel ('score ') plt. ylabel ('number of people') plt. title ('Distribution of CET-4 scores') plt. show ()
Figure 2
And the mean and standard deviation of the data can be obtained: 476.743785851 and 104.816562585, respectively.
As can be seen from the figure, the green bar chart is a normal distribution bar chart of $ \ mu $ = 476.743785851, $ \ sigma $ = 104.816562585, while the red is a distribution chart of four-level Score data, although the data is relatively small (more than 500 data records), the fitting is poor, but it can be seen that the Score data basically meets the normal distribution.
I don't know why, the normal curve is not drawn, and the normal curve can be drawn separately, which is to be studied.
Parameters for creating a Histogram
You can call the matplotlib. pyplot library for plotting. The hist function can directly draw a histogram.
Call method:
1
|
n, bins, patches = plt.hist(arr, bins=10, normed=0, facecolor='black', edgecolor='black',alpha=1,histtype='bar')
|
There are many hist parameters, but these six are commonly used. Only the first one is required, and the last four are optional.
Arr: one-dimensional array for histogram Calculation
Bins: Number of columns in the histogram. Optional. The default value is 10.
Normed: whether to normalize the obtained histogram vector. The default value is 0.
Facecolor: histogram color
Edgecolor: histogram border color
Alpha: Transparency
Histtype: histogram type, 'bar', 'barstacked', 'step', 'stepfiled'
Return Value:
N: histogram vector. Whether normalization is set by normed.
Bins: return the interval range of each bin.
Patches: returns the data contained in each bin, which is a list
From here from denny
Some links
Matplotlib
Library homepage
Gallary
Matplotlib examples and code are good learning tools.
Use python for scientific computing
Some tools for scientific computing using Python
Xlrd document
Numpy Methods