[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Source: Internet
Author: User

[Python Data Analysis] Grade 4 score distribution and python data analysis Distribution

Recently, I got some level-4 Score data, about 500 or more. So I wondered if these Score data could meet the so-called normal distribution? Let's do what we do, so we have this article.
This article introduces some usage of the xlrd module and some methods of drawing bar charts and random bar charts of Custom Data Using matplotlib, and provides some links, it can be used as a resource for learning matplotlib and numpy. It is also helpful to readers.

For more information, see

Tools
  • Python 1, 3.5
  • Xlrd Module
  • Numpy module and some dependent modules (check the installation method by yourself, and most pip can do it)
  • Matplotlib plotting Module
Xlrd basic usage 1. Import Module
1
import xlrd
2. Open an Excel file to read data
1
data = xlrd.open_workbook('excelFile.xls')
3. Tips
  • Get a worksheet

    1
    2
    3
    Table = data. sheets () [0] # obtain from index order
    Table = data. sheet_by_index (0) # obtain from index order
    Table = data. sheet_by_name (u'sheet1') # obtain it by name
  • Get the value of the whole row and the whole column (array)

    1
    2
    3
    4
    5
    6
    7
    Table. row_values (I)
    Table. col_values (I)
    '''
    * Get the number of rows and columns
    '''Python
    Nrows = table. nrows
    Ncols = table. ncols
  • Cyclic row List Data

    1
    2
    for i in range(nrows ):
    print table.row_values(i)
  • Cell

    1
    2
    cell_A1 = table.cell(0,0).value
    cell_C4 = table.cell(2,3).value
  • Use row and column Indexes

    1
    2
    cell_A1 = table.row(0)[0].value
    cell_A2 = table.col(1)[0].value
  • Simple Writing

    1
    2
    3
    4
    5
    6
    7
    8
    Row = 0
    Col = 0
    # Type 0 empty, 1 string, 2 number, 3 date, 4 boolean, 5 error
    Ctype = 1 value = 'cell value'
    Xf = 0 # extended formatting
    Table. put_cell (row, col, ctype, value, xf)
    Table. cell (0, 0) # cell value'
    Table. cell (0, 0). value # cell value'
Draw a line chart
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Import xlrd
Import numpy as np
Import matplotlib. pyplot as plt

Data = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ')

Table = data. sheets () [0] # sheet 0

Col5 = table. col_values (5) [1:] # obtain the score of column 5th and remove the column attribute name.

Count = [0 for I in range (0,650)] # initialize count
X = [I for I in range (0,650)]

For I in col5:
Num = int (I)
Count [num] + = 1 # count the number of people

Plt. xlabel ('score ')
Plt. ylabel ('number of people ')
Plt. title ('stribution of CET-4 scores ')
Plt. ylim (0, 8)
Plt. Plut ([I for I in range (250,650) if count [I]! = 0], [I in count [250:] if I! = 0], linewidth = 1) # draw a line chart
Plt. show ()

Figure 1

Draw a histogram and compare it with a normal distribution histogram
Import xlrdimport numpy as npfrom math import * import pylab as plimport matplotlib. pyplot as pltdata = xlrd. open_workbook ('d: \ Python Workspace \ Data \ cet4.xls ') table = data. sheets () [0] # sheet 0col5 = table. col_values (5) [1:] ha = [int (I) for I in col5] # result data mu = np. mean (ha) # mean sigma = np. std (ha) # standard deviation data = np. random. normal (mu, sigma, 1000) # generate normal Distribution Random Data x = np. linspace (0,700,100 0) y = (1. /sqrt (2 * np. pi)/sigma) * np. exp (-(x-mu) ** 2/(2 * sigma ** 2) plt. hist (data, bins = 100, facecolor = 'G', alpha = 0.44) plt. hist (ha, bins = 70, facecolor = 'R', histtype = 'stepfiled') plt. plot (x, y, color = 'B') # normal distribution curve plt. xlabel ('score ') plt. ylabel ('number of people') plt. title ('Distribution of CET-4 scores') plt. show ()

Figure 2
And the mean and standard deviation of the data can be obtained: 476.743785851 and 104.816562585, respectively.
As can be seen from the figure, the green bar chart is a normal distribution bar chart of $ \ mu $ = 476.743785851, $ \ sigma $ = 104.816562585, while the red is a distribution chart of four-level Score data, although the data is relatively small (more than 500 data records), the fitting is poor, but it can be seen that the Score data basically meets the normal distribution.
I don't know why, the normal curve is not drawn, and the normal curve can be drawn separately, which is to be studied.

Parameters for creating a Histogram

You can call the matplotlib. pyplot library for plotting. The hist function can directly draw a histogram.

Call method:

1
n, bins, patches = plt.hist(arr, bins=10, normed=0, facecolor='black', edgecolor='black',alpha=1,histtype='bar')

 

There are many hist parameters, but these six are commonly used. Only the first one is required, and the last four are optional.

Arr: one-dimensional array for histogram Calculation

Bins: Number of columns in the histogram. Optional. The default value is 10.

Normed: whether to normalize the obtained histogram vector. The default value is 0.

Facecolor: histogram color

Edgecolor: histogram border color

Alpha: Transparency

Histtype: histogram type, 'bar', 'barstacked', 'step', 'stepfiled'

Return Value:

N: histogram vector. Whether normalization is set by normed.

Bins: return the interval range of each bin.

Patches: returns the data contained in each bin, which is a list

From here from denny

Some links

Matplotlib

Library homepage
Gallary

Matplotlib examples and code are good learning tools.
Use python for scientific computing

Some tools for scientific computing using Python
Xlrd document
Numpy Methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.