"Python data Analysis" level four score distribution-MATPLOTLIB,XLRD application

Last Update:2016-04-14 Source: Internet

Author: User

Tags border color

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently obtained some four grade results data, about 500, so the whim can you see whether these results data meet the so-called normal distribution? Do it, so I have this article.
The article incidentally introduces some usages of xlrd module and some methods of matplotlib to draw the bar chart and random bar chart of custom data, and provides some related links, which can be used as resources to learn matplotlib and NumPy, and hope to help readers.

See here for a more graceful format

Tools

Python 3.5
XLRD Module
NumPy module and some dependent modules (please check your own method, most Pip can be done)
Matplotlib Drawing Module

XLRD Basic usage 1, Import module

1	Import xlrd

2. Open Excel file to read data

1	data = Xlrd.open_workbook (' Excelfile.xls ')

3, the use of skills

Get a worksheet

 table = Data.sheets () [0] # Get 
 table = Data.sheet_by_index (0) # Get 
 table = Data.sheet_by_name (u ' Sheet1 ') in index order # Get 
  by name

Get values for entire row and column (array)

 table.row_values (i) 
 table.col_values (i) 
 Span class= "line", "
 * get the number of rows and columns 
 " python 
  ncols = Table.ncols

Cyclic row and column table data

1 2	In range (nrows): Print Table.row_values (i)

Cell

1 2	CELL_A1 = Table.cell (0,0). Value CELL_C4 = Table.cell (2,3). Value

Using row and column indexes

1 2	CELL_A1 = Table.row (0) [0].value CELL_A2 = Table.col (1) [0].value

A simple write

0
0
#类型 0 empty,1 String, 2 number, 3 date, 4 Boolean, 5 error
' Cell value '
# Extended Formatting
Table.put_cell (Row, col, CType, value, XF)
Table.cell (0,#单元格的值 '
Table.cell (0,#单元格的值 '

Draw a line chart

Import xlrd
Import NumPyAs NP
Import Matplotlib.pyplotAs Plt

data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')

Table = Data.sheets () [0]#sheet 0

COL5 = Table.col_values (5) [1:]#取第5列的成绩, and remove the column property name

Count = [0For IIn range (0,650)]#初始化count
x = [IFor IIn range (0,650)]

For IIn COL5:
num = Int (i)
Count[num] + =1 #统计每个人数的人数 
  
 plt.xlabel ( ' Score ') 
 plt.ylabel ( ' number of people ') 
  Plt.title ( ' distribution of CET-4 Scores ') 
 plt.ylim ( 0,8) 
 plt.plot ([i for i span class= "keyword" >in range (250,650) if count[i] ! = 0],[i for i in count[ 250:] if i! = 0],linewidth=1) # Draw a line chart 
 plt.show ()

Figure 1

Draw a histogram and compare it to the normal distribution histogram

Import xlrd
Import NumPyAs NP
From MathImport *
Import PylabAs Pl
Import Matplotlib.pyplotAs Plt

data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')

Table = Data.sheets () [0]#sheet 0

COL5 = Table.col_values (4):1:]

ha = [Int (i)For IIn COL5]#成绩数据
mu = Np.mean (ha)#平均值
Sigma = np.std (ha)#标准差
data = Np.random.normal (Mu,sigma,1000)#生成正态分布随机数据

x = Np.linspace (0,700,1000)
y = (1./sqrt (2 * np.pi)/sigma) *np.exp (-((X-MU) **2/(2*sigma**< Span class= "number" >2)) 
  
 plt.hist (Data,bins= 100,facecolor= ' G ', Alpha=0.44) 
 plt.hist (ha,bins =70,facecolor= ' stepfilled ') 
 Span class= "line" >plt.plot (X,y,color= ' B ')  #正态分布曲线 
  plt.xlabel ( ' score ') 
  Plt.ylabel ( ' number of people ') 
 plt.title ( plt.show ()

Figure 2
The mean and standard deviations for the data are: 476.743785851 and 104.816562585, respectively.
As the graph shows, the Green bar chart is the normal distribution of the $\mu$=476.743785851,$\sigma$=104.816562585, and the red is the distribution of grade four data, although due to less data (more than 500 data), the fit is poor, But it can be seen that the results of data or basically meet the normal distribution.
Do not know why, the normal curve is not drawn out, the individual drawing of the normal curve can be drawn out, to be studied.

Some parameter explanation of drawing histogram

Drawings can be made by calling the Matplotlib.pyplot library, where the Hist function can draw histograms directly.

Call Mode:

1	N, bins, patches = plt.hist (arr, bins=10, normed=0, facecolor= ' black ', edgecolor= ' black ', alpha=1,histtype= ' bar)

hist parameters are very many, but commonly used on these six, only the first one is necessary, the following four optional

Arr: A one-dimensional array that needs to calculate the histogram

Bins: Histogram bar number, optional, default = 10

Normed: Whether the resulting histogram vector is normalized. Default is 0

Facecolor: Histogram color

Edgecolor: Histogram border color

Alpha: Transparency

Histtype: Histogram type, ' bar ', ' barstacked ', ' Step ', ' stepfilled '

return value:

N: Histogram vector, whether normalization is set by parameter normed

Bins: Returns the interval range for each bin

Patches: Returns the data contained within each bin and is a list

Excerpt from here from Denny

Some links

Matplotlib

Home page of the library
Gallary

Some examples of matplotlib and their code are good learning tools.
Make scientific calculations with Python

Some tools for scientific computing with Python
XLRD Documentation
Some methods of NumPy

"Python data Analysis" level four score distribution-MATPLOTLIB,XLRD application

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More