Python analyzes the pass of the school's level 4 and Level 6, and python's level 4 and level 6

Source: Internet
Author: User

Python analyzes the pass of the school's level 4 and Level 6, and python's level 4 and level 6

During this time, I read data analysis and have a basic understanding of numpy and pandas in Python. I know that if I don't need these skills to do anything, I will soon forget. I think of a four-or six-level External table from a school in the group, which can be used to familiarize myself with some usage in pandas.

1. Data introduction.

The fields in the tables are very detailed and contain a series of content such as grade, gender, name, and score. I just want to analyze the pass-through rate of our school, therefore, remove unnecessary fields. The following fields are left:

The first column is the auto-increment sequence number, which has no practical significance.

The second column indicates whether the student participates in level 4 or Level 6.

The third column is the department name of our school.

The fourth column is the major of the school department.

The Fifth Column is grade, and 13 represents admission in 2013.

The sixth column is gender.

The following three columns are total score, listening, reading, and writing.

All the items with a total score of 0 are missing. A total of nearly 9000 data records (none of which are registered ).

2. Expected results.

I want to use the data to display the following points in the form of icons:

1. average score of each school.

2. Number of students passing through the four or six levels of each school.

3. The number of qualified students of each school in each grade.

4. Number of people who pass the examination in each grade.

5. The number of men and girls who pass the examination respectively.

Final result:

Number of Students passing through the four or six levels of each school:

3. implementation process.

(1) import the dependency package.

The program uses pandas for group conversion and matplotlib for plotting.

import pandas as pdimport matplotlib.pylab as plt

(2) load data.

To analyze the data, we need to store the data in sj.xls, which is an Excel data type.
In this step, use read_excel of pandas to generate a DataFrame object.

# Load all data sj = pd. read_excel (r 'f: \ DataAnalysis \ sj.xls ')

Output the following content after loading:

Except for non-alignment in typographical la S.

(3) calculate the average score of each school.

Here we can complete the first expected result:

Average score of each school:

The situation for each school is, of course, divided into two groups: "CET4" and "CET6. Use groupby to generate a SeriesGroupBy object, and then call the mean function (the default value is axis 0, that is, the expected result) to calculate the average score.

# Group by school xymean = sj ['total score ']. groupby ([sj ['department name'], sj ['language level']) # calculate the average score of each school. xymean = xymean. mean ()

The output result is as follows:

Because the Department name and language level are hierarchical indexes, it does not seem very friendly. Therefore, unstack is used to convert the language level from row to column.

Xymean = xymean. unstack (level = 'language level ')

The output is clear.

Use pandas's drawing function for plotting:

# Use a horizontal bar chart to display xymean. plot (kind = 'barh') # to use it in PyCharm, you do not need plt. show () if you open it in the form of -- pylab in Ipython environment ()

Run the following command to check the result:

We can see that the results of this time data can be displayed, but the Chinese part of the problem, but it does not matter, a scientific query on the Internet to solve the problem: https://github.com/mwaskom/seaborn/issues/1009

Add the following code:

import matplotlib as mplmpl.rcParams['font.sans-serif'] = ['SimHei']mpl.rcParams['font.serif'] = ['SimHei']

Run it again.

Next, we will analyze the situation of pass through.

(4) filter data.

Now that you have all the data, the next step is to filter out all the people who pass the test.

# Filter out the number of people who pass through the sjpass = sj [sj ['Total']> = 425]

At this time, sjpass stores all the people who pass the test.

At the bottom of the output result, we can see a total of 1507 rows of data. Of course, you can also use len () or shape [0] to view the total number of rows.

(5) Number of Students passing through the four or six levels of each school.

Now that all the pass-through data is available, you can group the data based on the expected results. Similarly, the total score is grouped based on the "department name" and "language level". Then, the count function is used to sum the total score and unstack is used to adjust the chart display.

# Group by school xypass = sjpass ['Total']. groupby ([sjpass ['department name'], sjpass ['language level']) # calculate the total number of Pass Applicants of each school. xypass = xypass. count () # use the language level as columnsxypass = xypass. unstack (level = 'language level') # plot xypass. plot (kind = 'barh') plt. show ()

Drawing result:

(6) Number of Students passing through the course in each school ..

When grouping this time, you can add a grade, and to make the drawing look better, this time you can convert "Age" into a column, and some students like 12 years have no one to attend, therefore, fill the missing value with 0:

# Groups schools and grades. xypass = sjpass ['Total']. groupby ([sjpass ['department name'], sjpass ['language level'], sjpass ['Year']) # calculates the total number of passes of each school. xypass = xypass. count () # use the language level as columns, and fill the missing value with 0 in xypass = xypass. unstack (level = 'Year '). fillna (0) xypass. plot (kind = 'barh') plt. show ()

Drawing result:

(7) number of people who pass the examination in each grade.

You can use groupby to group grades:

# ----------------- Number of people who pass the examination in each grade -------------- njpass = sjpass ['Total']. groupby ([sjpass ['Year'], sjpass ['language level']). count (). unstack (level = 'language level') njpass. plot (kind = 'barh') plt. show ()

Drawing result:

(8) Number of men and girls who pass the examination respectively.

Group gender and language levels:

# --------------- Pass by male and female ------------------ nvpass = sjpass ['Total']. groupby ([sjpass ['gender '], sjpass ['language level']). count (). unstack (level = 'language level') nvpass. plot (kind = 'bar') plt. show ()

Drawing result:

4. Result Analysis.

From the drawing results, the average score of the music schools between different schools is relatively low. The average score of the Art and Design and foreign language schools is relatively high, but the number of people who pass the examination is not that large, in particular, the number of people in art design is small, mainly because the total number of students in the school is small.

The number of pass-through persons at level 4 is significantly higher than that at level 6. In addition, the number of pass-through persons at Level 15 is in Grade 2, so that students can take the level 4 and level 6 exams in our school, therefore, the number of people who pass the examination has a large score of 15 levels.

I have to admit that the customs clearance rate for girls is higher than that for boys.

Source code and data: https://github.com/jiajia0/DataAnalysis

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.