R Language Kmens Clustering

Source: Internet
Author: User
Tags volatile

data Source: Download liquor Chemical composition data in the following links, divided into red wine, liquor two kinds of data files, red wine and liquor in the chemical composition of the more obvious differences

http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/


Analysis Process:

# 1) First mix two sets of data into one set
# import Data source
Red<-read.csv ("~\winequality-red.csv", T) #导入红酒数据
White<-read.csv ("~\winequality-white.csv", T) #导入白酒数据
# Add new fields type,1-red wine, 2-baijiu
Red$type<-1;white$type<-2
# Mix two sets of data into one set
Wine<-rbind (Red,white)
# View the number of records before and after merging
Nrow (red); Nrow (white); Nrow (wine)
[1] 1599
[1] 4898
[1] 6497
# See the top six and last six records of wine
Head (wine); tail (wine)
> Head (wine); tail (wine)
Fixed.acidity volatile.acidity citric.acid residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density Ph
1 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.5 1
2 7.8 0.88 0.00 2.6 0.098 25 67 0.9968 3.2 0
3 7.8 0.76 0.04 2.3 0.092 15 54 0.9970 3.2 6
4 11.2 0.28 0.56 1.9 0.075 17 60 0.9980 3.1 6
5 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.5 1
6 7.4 0.66 0.00 1.8 0.075 13 40 0.9978 3.5 1
Sulphates Alcohol Quality type
1 0.56 9.4) 5 1
2 0.68 9.8) 5 1
3 0.65 9.8) 5 1
4 0.58 9.8) 6 1
5 0.56 9.4) 5 1
6 0.56 9.4) 5 1
Fixed.acidity volatile.acidity citric.acid residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density Ph
6492 6.5 0.23 0.38 1.3 0.032 29 112 0.99298 3.29
6493 6.2 0.21 0.29 1.6 0.039 24 92 0.99114 3.27
6494 6.6 0.32 0.36 8.0 0.047 57 168 0.99490 3.15
6495 6.5 0.24 0.19 1.2 0.041 30 111 0.99254 2.99
6496 5.5 0.29 0.30 1.1 0.022 20 110 0.98869 3.34
6497 6.0 0.21 0.38 0.8 0.020 22 98 0.98941 3.26
Sulphates Alcohol Quality type
6492 0.54 9.7) 5 2
6493 0.50 11.2) 6 2
6494 0.46 9.6) 5 2
6495 0.46 9.4) 6 2
6496 0.38 12.8) 7 2
6497 0.32 11.8) 6 2
# 2) After mixing the data, according to the chemical composition (note may only need some indicators) to cluster, re-dividing the wine, liquor
# After preliminary exploration, it is found that the quality has little effect on the classification of wine, the next cluster does not consider the index
(C1<-kmeans (wine[,1:11],2))
#画出查看整体分类效果
Plot (Wine$alcohol~wine$free.sulfur.dioxide,col=c1$cluster)
Points (cl$centers, col = 1:2, pch = 8, CeX = 2)

# 3) Compare the results of clustering with the original sub-conditions and see how the clustering effect

Table (Wine[,13],c1$cluster)

1 2
1 1514 85
2 1294 3604
# Clustering effect is not good, of which 85 are actually red wine samples are divided into white liquor, there are 1294 of liquor mistakenly divided into red wine.

# View Overall accuracy rate
SUM (diag (table (Wine[,13],c1$cluster)))/nrow (wine)
[1] 0.7877482

The accuracy of the results after clustering is 0. 7877482.

R Language Kmens Clustering

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.