"Python data Analysis" level four score distribution-MATPLOTLIB,XLRD application

Source: Internet
Author: User
Tags border color

Recently obtained some four grade results data, about 500, so the whim can you see whether these results data meet the so-called normal distribution? Do it, so I have this article.
The article incidentally introduces some usages of xlrd module and some methods of matplotlib to draw the bar chart and random bar chart of custom data, and provides some related links, which can be used as resources to learn matplotlib and NumPy, and hope to help readers.

See here for a more graceful format

Tools
    • Python 3.5
    • XLRD Module
    • NumPy module and some dependent modules (please check your own method, most Pip can be done)
    • Matplotlib Drawing Module
XLRD Basic usage 1, Import module
1
Import xlrd
2. Open Excel file to read data
1
data = Xlrd.open_workbook (' Excelfile.xls ')
3, the use of skills
  • Get a worksheet

     1 
    2
    3
     table = Data.sheets () [0] # Get 
    table = Data.sheet_by_index (0) # Get
    table = Data.sheet_by_name (u ' Sheet1 ') in index order # Get
    by name
  • Get values for entire row and column (array)

     1 
    2
    3
    4
    5
    6
    7
     table.row_values (i) 
    table.col_values (i)
    Span class= "line", "
    * get the number of rows and columns
    " python
    ncols = Table.ncols
  • Cyclic row and column table data

    1
    2
    In range (nrows):
    Print Table.row_values (i)
  • Cell

    1
    2
    CELL_A1 = Table.cell (0,0). Value
    CELL_C4 = Table.cell (2,3). Value
  • Using row and column indexes

    1
    2
    CELL_A1 = Table.row (0) [0].value
    CELL_A2 = Table.col (1) [0].value
  • A simple write

    1
    2
    3
    4
    5
    6
    7
    8
    0
    0
    #类型 0 empty,1 String, 2 number, 3 date, 4 Boolean, 5 error
    ' Cell value '
    # Extended Formatting
    Table.put_cell (Row, col, CType, value, XF)
    Table.cell (0,#单元格的值 '
    Table.cell (0,#单元格的值 '
Draw a line chart
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22
23
Import xlrd
Import NumPyAs NP
Import Matplotlib.pyplotAs Plt

data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')

Table = Data.sheets () [0]#sheet 0

COL5 = Table.col_values (5) [1:]#取第5列的成绩, and remove the column property name

Count = [0For IIn range (0,650)]#初始化count
x = [IFor IIn range (0,650)]

For IIn COL5:
num = Int (i)
Count[num] + =1 #统计每个人数的人数

plt.xlabel ( ' Score ')
plt.ylabel ( ' number of people ')
Plt.title ( ' distribution of CET-4 Scores ')
plt.ylim ( 0,8)
plt.plot ([i for i span class= "keyword" >in range (250,650) if count[i] ! = 0],[i for i in count[ 250:] if i! = 0],linewidth=1) # Draw a line chart
plt.show ()

Figure 1

Draw a histogram and compare it to the normal distribution histogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22
23
24
25
26
27
28
Import xlrd
Import NumPyAs NP
From MathImport *
Import PylabAs Pl
Import Matplotlib.pyplotAs Plt

data = Xlrd.open_workbook (' D:\\python Workspace\\data\\cet4.xls ')

Table = Data.sheets () [0]#sheet 0

COL5 = Table.col_values (4):1:]

ha = [Int (i)For IIn COL5]#成绩数据
mu = Np.mean (ha)#平均值
Sigma = np.std (ha)#标准差
data = Np.random.normal (Mu,sigma,1000)#生成正态分布随机数据

x = Np.linspace (0,700,1000)
y = (1./sqrt (2 * np.pi)/sigma) *np.exp (-((X-MU) **2/(2*sigma**< Span class= "number" >2))

plt.hist (Data,bins= 100,facecolor= ' G ', Alpha=0.44)
plt.hist (ha,bins =70,facecolor= ' stepfilled ')
Span class= "line" >plt.plot (X,y,color= ' B ') #正态分布曲线
plt.xlabel ( ' score ')
Plt.ylabel ( ' number of people ')
plt.title ( plt.show ()

Figure 2
The mean and standard deviations for the data are: 476.743785851 and 104.816562585, respectively.
As the graph shows, the Green bar chart is the normal distribution of the $\mu$=476.743785851,$\sigma$=104.816562585, and the red is the distribution of grade four data, although due to less data (more than 500 data), the fit is poor, But it can be seen that the results of data or basically meet the normal distribution.
Do not know why, the normal curve is not drawn out, the individual drawing of the normal curve can be drawn out, to be studied.

Some parameter explanation of drawing histogram

Drawings can be made by calling the Matplotlib.pyplot library, where the Hist function can draw histograms directly.

Call Mode:

1
N, bins, patches = plt.hist (arr, bins=10, normed=0, facecolor= ' black ', edgecolor= ' black ', alpha=1,histtype= ' bar)

hist parameters are very many, but commonly used on these six, only the first one is necessary, the following four optional

Arr: A one-dimensional array that needs to calculate the histogram

Bins: Histogram bar number, optional, default = 10

Normed: Whether the resulting histogram vector is normalized. Default is 0

Facecolor: Histogram color

Edgecolor: Histogram border color

Alpha: Transparency

Histtype: Histogram type, ' bar ', ' barstacked ', ' Step ', ' stepfilled '

return value:

N: Histogram vector, whether normalization is set by parameter normed

Bins: Returns the interval range for each bin

Patches: Returns the data contained within each bin and is a list

Excerpt from here from Denny

Some links

Matplotlib

Home page of the library
Gallary

Some examples of matplotlib and their code are good learning tools.
Make scientific calculations with Python

Some tools for scientific computing with Python
XLRD Documentation
Some methods of NumPy

"Python data Analysis" level four score distribution-MATPLOTLIB,XLRD application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.