data Source: Download liquor Chemical composition data in the following links, divided into red wine, liquor two kinds of data files, red wine and liquor in the chemical composition of the more obvious differences
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/
Analysis Process:
# 1) First mix two sets of data into one set
# import Data source
Red<-read.csv ("~\winequality-red.csv", T) #导入红酒数据
White<-read.csv ("~\winequality-white.csv", T) #导入白酒数据
# Add new fields type,1-red wine, 2-baijiu
Red$type<-1;white$type<-2
# Mix two sets of data into one set
Wine<-rbind (Red,white)
# View the number of records before and after merging
Nrow (red); Nrow (white); Nrow (wine)
[1] 1599
[1] 4898
[1] 6497
# See the top six and last six records of wine
Head (wine); tail (wine)
> Head (wine); tail (wine)
Fixed.acidity volatile.acidity citric.acid residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density Ph
1 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.5 1
2 7.8 0.88 0.00 2.6 0.098 25 67 0.9968 3.2 0
3 7.8 0.76 0.04 2.3 0.092 15 54 0.9970 3.2 6
4 11.2 0.28 0.56 1.9 0.075 17 60 0.9980 3.1 6
5 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.5 1
6 7.4 0.66 0.00 1.8 0.075 13 40 0.9978 3.5 1
Sulphates Alcohol Quality type
1 0.56 9.4) 5 1
2 0.68 9.8) 5 1
3 0.65 9.8) 5 1
4 0.58 9.8) 6 1
5 0.56 9.4) 5 1
6 0.56 9.4) 5 1
Fixed.acidity volatile.acidity citric.acid residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density Ph
6492 6.5 0.23 0.38 1.3 0.032 29 112 0.99298 3.29
6493 6.2 0.21 0.29 1.6 0.039 24 92 0.99114 3.27
6494 6.6 0.32 0.36 8.0 0.047 57 168 0.99490 3.15
6495 6.5 0.24 0.19 1.2 0.041 30 111 0.99254 2.99
6496 5.5 0.29 0.30 1.1 0.022 20 110 0.98869 3.34
6497 6.0 0.21 0.38 0.8 0.020 22 98 0.98941 3.26
Sulphates Alcohol Quality type
6492 0.54 9.7) 5 2
6493 0.50 11.2) 6 2
6494 0.46 9.6) 5 2
6495 0.46 9.4) 6 2
6496 0.38 12.8) 7 2
6497 0.32 11.8) 6 2
# 2) After mixing the data, according to the chemical composition (note may only need some indicators) to cluster, re-dividing the wine, liquor
# After preliminary exploration, it is found that the quality has little effect on the classification of wine, the next cluster does not consider the index
(C1<-kmeans (wine[,1:11],2))
#画出查看整体分类效果
Plot (Wine$alcohol~wine$free.sulfur.dioxide,col=c1$cluster)
Points (cl$centers, col = 1:2, pch = 8, CeX = 2)
# 3) Compare the results of clustering with the original sub-conditions and see how the clustering effect
Table (Wine[,13],c1$cluster)
1 2
1 1514 85
2 1294 3604
# Clustering effect is not good, of which 85 are actually red wine samples are divided into white liquor, there are 1294 of liquor mistakenly divided into red wine.
# View Overall accuracy rate
SUM (diag (table (Wine[,13],c1$cluster)))/nrow (wine)
[1] 0.7877482
The accuracy of the results after clustering is 0. 7877482.
R Language Kmens Clustering