Recently obtained some four grade results data, about 500, so the whim can you see whether these results data meet the so-called normal distribution? Do it, so I have this article.
The article incidentally introduces some usages of xlrd module and some methods of matplotlib to draw the bar chart and random bar chart of custom data, and provides some related links, which can be used as resources to learn matplotlib and NumPy, and hope to help readers.
See here for a more graceful format
Tools
- Python 3.5
- XLRD Module
- NumPy module and some dependent modules (please check your own method, most Pip can be done)
- Matplotlib Drawing Module
XLRD Basic usage 1, Import module
2. Open Excel file to read data
1
|
data = Xlrd.open_workbook (' Excelfile.xls ')
|
3, the use of skills
-
Get a worksheet
1 2
3 |
table = Data.sheets () [0] # Get table = Data.sheet_by_index (0) # Get table = Data.sheet_by_name (u ' Sheet1 ') in index order # Get by name |
-
Get values for entire row and column (array)
1 2 3 4 5 6 7 |
table.row_values (i) table.col_values (i) Span class= "line", " * get the number of rows and columns " python ncols = Table.ncols |
Cyclic row and column table data
1 2
|
In range (nrows): Print Table.row_values (i)
|
Cell
1 2
|
CELL_A1 = Table.cell (0,0). Value CELL_C4 = Table.cell (2,3). Value
|
Using row and column indexes
1 2
|
CELL_A1 = Table.row (0) [0].value CELL_A2 = Table.col (1) [0].value
|
A simple write
1 2 3 4 5 6 7 8
|
0 0 #类型 0 empty,1 String, 2 number, 3 date, 4 Boolean, 5 error ' Cell value ' # Extended Formatting Table.put_cell (Row, col, CType, value, XF) Table.cell (0,#单元格的值 ' Table.cell (0,#单元格的值 '
|
Draw a line chart
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23
|
Import xlrd Import NumPyAs NP Import Matplotlib.pyplotAs Plt
data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')
Table = Data.sheets () [0]#sheet 0
COL5 = Table.col_values (5) [1:]#取第5列的成绩, and remove the column property name
Count = [0For IIn range (0,650)]#初始化count x = [IFor IIn range (0,650)]
For IIn COL5: num = Int (i) Count[num] + =1 #统计每个人数的人数 plt.xlabel ( ' Score ') plt.ylabel ( ' number of people ') Plt.title ( ' distribution of CET-4 Scores ') plt.ylim ( 0,8) plt.plot ([i for i span class= "keyword" >in range (250,650) if count[i] ! = 0],[i for i in count[ 250:] if i! = 0],linewidth=1) # Draw a line chart plt.show () |
Figure 1
Draw a histogram and compare it to the normal distribution histogram
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21st 22 23 24 25 26 27 28
|
Import xlrd Import NumPyAs NP From MathImport * Import PylabAs Pl Import Matplotlib.pyplotAs Plt
data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')
Table = Data.sheets () [0]#sheet 0
COL5 = Table.col_values (4):1:]
ha = [Int (i)For IIn COL5]#成绩数据 mu = Np.mean (ha)#平均值 Sigma = np.std (ha)#标准差 data = Np.random.normal (Mu,sigma,1000)#生成正态分布随机数据
x = Np.linspace (0,700,1000) y = (1./sqrt (2 * np.pi)/sigma) *np.exp (-((X-MU) **2/(2*sigma**< Span class= "number" >2)) plt.hist (Data,bins= 100,facecolor= ' G ', Alpha=0.44) plt.hist (ha,bins =70,facecolor= ' stepfilled ') Span class= "line" >plt.plot (X,y,color= ' B ') #正态分布曲线 plt.xlabel ( ' score ') Plt.ylabel ( ' number of people ') plt.title ( plt.show () |
Figure 2
The mean and standard deviations for the data are: 476.743785851 and 104.816562585, respectively.
As the graph shows, the Green bar chart is the normal distribution of the $\mu$=476.743785851,$\sigma$=104.816562585, and the red is the distribution of grade four data, although due to less data (more than 500 data), the fit is poor, But it can be seen that the results of data or basically meet the normal distribution.
Do not know why, the normal curve is not drawn out, the individual drawing of the normal curve can be drawn out, to be studied.
Some parameter explanation of drawing histogram
Drawings can be made by calling the Matplotlib.pyplot library, where the Hist function can draw histograms directly.
Call Mode:
1
|
N, bins, patches = plt.hist (arr, bins=10, normed=0, facecolor= ' black ', edgecolor= ' black ', alpha=1,histtype= ' bar)
|
hist parameters are very many, but commonly used on these six, only the first one is necessary, the following four optional
Arr: A one-dimensional array that needs to calculate the histogram
Bins: Histogram bar number, optional, default = 10
Normed: Whether the resulting histogram vector is normalized. Default is 0
Facecolor: Histogram color
Edgecolor: Histogram border color
Alpha: Transparency
Histtype: Histogram type, ' bar ', ' barstacked ', ' Step ', ' stepfilled '
return value:
N: Histogram vector, whether normalization is set by parameter normed
Bins: Returns the interval range for each bin
Patches: Returns the data contained within each bin and is a list
Excerpt from here from Denny
Some links
Matplotlib
Home page of the library
Gallary
Some examples of matplotlib and their code are good learning tools.
Make scientific calculations with Python
Some tools for scientific computing with Python
XLRD Documentation
Some methods of NumPy
"Python data Analysis" level four score distribution-MATPLOTLIB,XLRD application