Test scores for all subjects in a given grade, in order to not reveal the child's name, it is replaced with the number of the student. Who feels interested, can download test data.
Num class CHN Math eng PHY Chem Politics bio History Geo PE
01583991201147049.5504948.549.560
04427107120118.568.6434948.548.54956
02494981201167047.5474947.54960
05739102113111.5704749494949.560
03105103120111.57044.7546.548484860
# Set up working directory in Windows
SETWD ("D:/scores_test")
# Read the score table, the first line is the header
Scores <-read.table ("Scores.txt", Header=true, row.names= "num")
Head (scores)
STR (scores) # Displays the structure of the object
Names (scores) # Displays the name of each column
Attach (Scores)
# Give a summary of the data
Summary (scores)
Summary (Scores$math)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.00 84.00 100.00 93.98 111.00 120.00
# 1st Qu. First 4 decimal place
# Select a row
Child <-scores[' 239 ',]
SUM (child) #求孩子的总分
[1] 647.45
SCORES.CLASS4 <-scores[class==4,] # pick out 4 shifts
# Ask for average math scores for each class
Aver <-tapply (Math, class, mean)
# Draw a curve to see the average math score for each class
Plot (aver, type= ' B ', Ylim=c (80,100), main= "Average score of each class math score", xlab= "class", ylab= "average math score")
# View the distribution of the data
Table (Math, Class)
Class
Math 1 2 3 4 5 6 7 8 9 10
3 0 0 0 0 0 0 1 0 0 0
9 1 0 0 0 0 0 0 0 0 0
10 1 0 1 0 0 0 0 0 0 0
18 0 0 0 1 0 1 0 0 1 0
...............
# Ask for the average score for each of the 4 classes
Subjects <-C (' CHN ', ' math ', ' eng ', ' PHY ', ' Chem ', ' Politics ', ' bio ', ' history ', ' Geo ', ' PE ')
Sapply (scores[class==4, subjects], mean)
CHN Math Eng PHY Chem Politics bio History Geo PE
83.10938 97.29688 85.60156 54.30469 34.67969 42.41406 41.79688 36.77344 44.24219 54.31250
# Look at the distribution of math scores
hist (Math)
BoxPlot (Math)
# Look at the relevance of the results of each section
# It can be seen that the correlation between mathematics and physics is 88%, and the correlation between physical and chemical results is 86%.
Cor (Scores[,subjects])
CHN Math Eng PHY Chem Politics bio History Geo PE
CHN 1.0000000 0.6588126 0.7326778 0.6578172 0.6271155 0.7257003 0.6902282 0.6971145 0.6438662 0.2712453
Math 0.6588126 1.00000000.8079255 0.8860467 0.83046430.7090681 0.7951987 0.7732791) 0.7723853 0.3300249
Eng 0.7326778 0.8079255 1.00000000.81709980.7868710 0.7498946 0.7731044 0.7948219 0.7265406 0.3159347
PHY 0.6578172 0.8860467 0.8170998 1.00000000.86155120.7081717 0.80771050.8100599 0.7814152 0.3251233
Chem 0.6271155 0.8304643 0.78687100.86155121.0000000 0.6441334 0.7578770 0.7993298 0.7264814 0.2769066
Politics 0.7257003 0.7090681 0.7498946 0.7081717 0.6441334 1.0000000 0.7071181 0.7192860 0.6906930 0.3033607
Bio 0.6902282 0.7951987 0.7731044 0.8077105 0.7578770 0.7071181 1.0000000 0.7771735 0.8382525 0.2428081
History 0.6971145 0.7732791 0.7948219 0.8100599 0.7993298 0.7192860 0.7771735 1.0000000 0.7731044 0.2708434
Geo 0.6438662 0.7723853 0.7265406 0.7814152 0.7264814 0.6906930 0.8382525 0.7731044 1.0000000 0.2605251
PE 0.2712453 0.3300249 0.3159347 0.3251233 0.2769066 0.3033607 0.2428081 0.2708434 0.2605251 1.0000000
# Draw a picture and see it #
Pairs (Scores[,subjects])
# Take a closer look at the linear correlations between mathematics and physics
Cor_phy_math <-LM (PHY ~ Math, scores)
Plot (math, PHY)
Abline (Cor_phy_math)
Cor_phy_math
# that is, the FIT formula is: PHY = 0.5258 * Math + 4.7374, why 0.52? Because mathematics is the highest of 120, physics is divided into 70.
Call:
LM (formula = PHY ~ Math, data = scores)
Coefficients:
(Intercept) Math
4.7374 0.5258
R Language Learning Notes: Analyzing students ' test scores