# [Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Source: Internet
Author: User

[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Recently, I got some level-4 Score data, about 500 or more. So I wondered if these Score data could meet the so-called normal distribution? Let's do what we do, so we have this article.
This article introduces some usage of the xlrd module and some methods of drawing bar charts and random bar charts of Custom Data Using matplotlib, and provides some links, it can be used as a resource for learning matplotlib and numpy. It is also helpful to readers.

Tools
• Python 1, 3.5
• Xlrd Module
• Numpy module and some dependent modules (check the installation method by yourself, and most pip can do it)
• Matplotlib plotting Module
Xlrd basic usage 1. Import Module
 `1` `import xlrd`
2. Open an Excel file to read data
 `1` `data = xlrd.open_workbook('excelFile.xls')`
3. Tips
• Get a worksheet

 `123` `Table = data. sheets () [0] # obtain from index orderTable = data. sheet_by_index (0) # obtain from index orderTable = data. sheet_by_name (u'sheet1') # obtain it by name`
• Get the value of the whole row and the whole column (array)

 `1234567` `Table. row_values (I)Table. col_values (I)'''* Get the number of rows and columns'''PythonNrows = table. nrowsNcols = table. ncols`
• Cyclic row List Data

 `12` `for i in range(nrows ): print table.row_values(i)`
• Cell

 `12` `cell_A1 = table.cell(0,0).valuecell_C4 = table.cell(2,3).value`
• Use row and column Indexes

 `12` `cell_A1 = table.row(0)[0].valuecell_A2 = table.col(1)[0].value`
• Simple Writing

 `12345678` `Row = 0Col = 0# Type 0 empty, 1 string, 2 number, 3 date, 4 boolean, 5 errorCtype = 1 value = 'cell value'Xf = 0 # extended formattingTable. put_cell (row, col, ctype, value, xf)Table. cell (0, 0) # cell value'Table. cell (0, 0). value # cell value'`
Draw a line chart
 `1234567891011121314151617181920212223` `Import xlrdImport numpy as npImport matplotlib. pyplot as pltData = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ')Table = data. sheets () [0] # sheet 0Col5 = table. col_values (5) [1:] # obtain the score of column 5th and remove the column attribute name.Count = [0 for I in range (0,650)] # initialize countX = [I for I in range (0,650)]For I in col5:Num = int (I)Count [num] + = 1 # count the number of peoplePlt. xlabel ('score ')Plt. ylabel ('number of people ')Plt. title ('stribution of CET-4 scores ')Plt. ylim (0, 8)Plt. Plut ([I for I in range (250,650) if count [I]! = 0], [I in count [250:] if I! = 0], linewidth = 1) # draw a line chartPlt. show ()`

Figure 1

Draw a histogram and compare it with a normal distribution histogram
`Import xlrdimport numpy as npfrom math import * import pylab as plimport matplotlib. pyplot as pltdata = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ') table = data. sheets () [0] # sheet 0col5 = table. col_values (5) [1:] ha = [int (I) for I in col5] # result data mu = np. mean (ha) # mean sigma = np. std (ha) # standard deviation data = np. random. normal (mu, sigma, 1000) # generate normal Distribution Random Data x = np. linspace (0,700,100 0) y = (1. /sqrt (2 * np. pi)/sigma) * np. exp (-(x-mu) ** 2/(2 * sigma ** 2) plt. hist (data, bins = 100, facecolor = 'G', alpha = 0.44) plt. hist (ha, bins = 70, facecolor = 'R', histtype = 'stepfiled') plt. plot (x, y, color = 'B') # normal distribution curve plt. xlabel ('score ') plt. ylabel ('number of people') plt. title ('Distribution of CET-4 scores') plt. show ()`

Figure 2
And the mean and standard deviation of the data can be obtained: 476.743785851 and 104.816562585, respectively.
As can be seen from the figure, the green bar chart is a normal distribution bar chart of \$ \ mu \$ = 476.743785851, \$ \ sigma \$ = 104.816562585, while the red is a distribution chart of four-level Score data, although the data is relatively small (more than 500 data records), the fitting is poor, but it can be seen that the Score data basically meets the normal distribution.
I don't know why, the normal curve is not drawn, and the normal curve can be drawn separately, which is to be studied.

Parameters for creating a Histogram

You can call the matplotlib. pyplot library for plotting. The hist function can directly draw a histogram.

Call method:

 `1` `n, bins, patches = plt.hist(arr, bins=10, normed=0, facecolor='black', edgecolor='black',alpha=1，histtype='bar')`

There are many hist parameters, but these six are commonly used. Only the first one is required, and the last four are optional.

Arr: one-dimensional array for histogram Calculation

Bins: Number of columns in the histogram. Optional. The default value is 10.

Normed: whether to normalize the obtained histogram vector. The default value is 0.

Facecolor: histogram color

Edgecolor: histogram border color

Alpha: Transparency

Histtype: histogram type, 'bar', 'barstacked', 'step', 'stepfiled'

Return Value:

N: histogram vector. Whether normalization is set by normed.

Bins: return the interval range of each bin.

Patches: returns the data contained in each bin, which is a list

From here from denny

Matplotlib

Library homepage
Gallary

Matplotlib examples and code are good learning tools.
Use python for scientific computing

Some tools for scientific computing using Python
Xlrd document
Numpy Methods

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.