[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Last Update:2016-04-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, I got some level-4 Score data, about 500 or more. So I wondered if these Score data could meet the so-called normal distribution? Let's do what we do, so we have this article.
This article introduces some usage of the xlrd module and some methods of drawing bar charts and random bar charts of Custom Data Using matplotlib, and provides some links, it can be used as a resource for learning matplotlib and numpy. It is also helpful to readers.

For more information, see

Tools

Python 1, 3.5
Xlrd Module
Numpy module and some dependent modules (check the installation method by yourself, and most pip can do it)
Matplotlib plotting Module

Xlrd basic usage 1. Import Module

1	import xlrd

2. Open an Excel file to read data

1	data = xlrd.open_workbook('excelFile.xls')

3. Tips

Get a worksheet

1
2
3

Table = data. sheets () [0] # obtain from index order
Table = data. sheet_by_index (0) # obtain from index order
Table = data. sheet_by_name (u'sheet1') # obtain it by name

Get the value of the whole row and the whole column (array)

Table. row_values (I)
Table. col_values (I)
'''
* Get the number of rows and columns
'''Python
Nrows = table. nrows
Ncols = table. ncols

Cyclic row List Data

1 2	for i in range(nrows ): print table.row_values(i)

Cell

1 2	cell_A1 = table.cell(0,0).value cell_C4 = table.cell(2,3).value

Use row and column Indexes

1 2	cell_A1 = table.row(0)[0].value cell_A2 = table.col(1)[0].value

Simple Writing

Row = 0
Col = 0
# Type 0 empty, 1 string, 2 number, 3 date, 4 boolean, 5 error
Ctype = 1 value = 'cell value'
Xf = 0 # extended formatting
Table. put_cell (row, col, ctype, value, xf)
Table. cell (0, 0) # cell value'
Table. cell (0, 0). value # cell value'

Draw a line chart

Import xlrd
Import numpy as np
Import matplotlib. pyplot as plt

Data = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ')

Table = data. sheets () [0] # sheet 0

Col5 = table. col_values (5) [1:] # obtain the score of column 5th and remove the column attribute name.

Count = [0 for I in range (0,650)] # initialize count
X = [I for I in range (0,650)]

For I in col5:
Num = int (I)
Count [num] + = 1 # count the number of people

Plt. xlabel ('score ')
Plt. ylabel ('number of people ')
Plt. title ('stribution of CET-4 scores ')
Plt. ylim (0, 8)
Plt. Plut ([I for I in range (250,650) if count [I]! = 0], [I in count [250:] if I! = 0], linewidth = 1) # draw a line chart
Plt. show ()

Figure 1

Draw a histogram and compare it with a normal distribution histogram

Import xlrdimport numpy as npfrom math import * import pylab as plimport matplotlib. pyplot as pltdata = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ') table = data. sheets () [0] # sheet 0col5 = table. col_values (5) [1:] ha = [int (I) for I in col5] # result data mu = np. mean (ha) # mean sigma = np. std (ha) # standard deviation data = np. random. normal (mu, sigma, 1000) # generate normal Distribution Random Data x = np. linspace (0,700,100 0) y = (1. /sqrt (2 * np. pi)/sigma) * np. exp (-(x-mu) ** 2/(2 * sigma ** 2) plt. hist (data, bins = 100, facecolor = 'G', alpha = 0.44) plt. hist (ha, bins = 70, facecolor = 'R', histtype = 'stepfiled') plt. plot (x, y, color = 'B') # normal distribution curve plt. xlabel ('score ') plt. ylabel ('number of people') plt. title ('Distribution of CET-4 scores') plt. show ()

Figure 2
And the mean and standard deviation of the data can be obtained: 476.743785851 and 104.816562585, respectively.
As can be seen from the figure, the green bar chart is a normal distribution bar chart of $ \ mu $ = 476.743785851, $ \ sigma $ = 104.816562585, while the red is a distribution chart of four-level Score data, although the data is relatively small (more than 500 data records), the fitting is poor, but it can be seen that the Score data basically meets the normal distribution.
I don't know why, the normal curve is not drawn, and the normal curve can be drawn separately, which is to be studied.

Parameters for creating a Histogram

You can call the matplotlib. pyplot library for plotting. The hist function can directly draw a histogram.

Call method:

1	n, bins, patches = plt.hist(arr, bins=10, normed=0, facecolor='black', edgecolor='black',alpha=1，histtype='bar')

There are many hist parameters, but these six are commonly used. Only the first one is required, and the last four are optional.

Arr: one-dimensional array for histogram Calculation

Bins: Number of columns in the histogram. Optional. The default value is 10.

Normed: whether to normalize the obtained histogram vector. The default value is 0.

Facecolor: histogram color

Edgecolor: histogram border color

Alpha: Transparency

Histtype: histogram type, 'bar', 'barstacked', 'step', 'stepfiled'

Return Value:

N: histogram vector. Whether normalization is set by normed.

Bins: return the interval range of each bin.

Patches: returns the data contained in each bin, which is a list

From here from denny

Some links

Matplotlib

Library homepage
Gallary

Matplotlib examples and code are good learning tools.
Use python for scientific computing

Some tools for scientific computing using Python
Xlrd document
Numpy Methods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support