Software--machine learning and Python, clustering, K--means

Source: Internet
Author: User

K-means is a clustering algorithm:

Here, we use K-means to classify 31 cities.

The city's data is stored in the City.txt file, which reads as follows:

bj,2959.19,730.79,749.41,513.34,467.87,1141.82,478.42,457.64
tianjin,2459.77,495.47,697.33,302.87,284.19,735.97,570.84,305.08
hebei,1495.63,515.90,362.37,285.32,272.95,540.58,364.91,188.63
shanxi,1406.33,477.77,290.15,208.57,201.50,414.72,281.84,212.10
nmg,1303.97,524.29,254.83,192.17,249.81,463.09,287.87,192.96
liaoning,1730.84,553.90,246.91,279.81,239.18,445.20,330.24,163.86
jilin,1561.86,492.42,200.49,218.36,220.69,459.62,360.48,147.76
hlj,1410.11,510.71,211.88,277.11,224.65,376.82,317.61,152.85
shanghai,3712.31,550.74,893.37,346.93,527.00,1034.98,720.33,462.03
jiangsu,2207.58,449.37,572.40,211.92,302.09,585.23,429.77,252.54
zhejiang,2629.16,557.32,689.73,435.69,514.66,795.87,575.76,323.36
anhui,1844.78,430.29,271.28,126.33,250.56,513.18,314.00,151.39
fujian,2709.46,428.11,334.12,160.77,405.14,461.67,535.13,232.29
jiangxi,1563.78,303.65,233.81,107.90,209.70,393.99,509.39,160.12
shandong,1675.75,613.32,550.71,219.79,272.59,599.43,371.62,211.84
henan,1427.65,431.79,288.55,208.14,217.00,337.76,421.31,165.32
hunan,1942.23,512.27,401.39,206.06,321.29,697.22,492.60,226.45
hubei,1783.43,511.88,282.84,201.01,237.60,617.74,523.52,182.52
guangdong,3055.17,353.23,564.56,356.27,811.88,873.06,1082.82,420.81
guangxi,2033.87,300.82,338.65,157.78,329.06,621.74,587.02,218.27
hainan,2057.86,186.44,202.72,171.79,329.65,477.17,312.93,279.19
chongqing,2303.29,589.99,516.21,236.55,403.92,730.05,438.41,225.80
sichuang,1974.28,507.76,344.79,203.21,240.24,575.10,430.36,223.46
guizhou,1673.82,437.75,461.61,153.32,254.66,445.59,346.11,191.48
yunnan,2194.25,537.01,369.07,249.54,290.84,561.91,407.70,330.95
xizang,2646.61,839.70,204.44,209.11,379.30,371.04,269.59,389.33
shanxi,1472.95,390.89,447.95,259.51,230.61,490.90,469.10,191.34
gansu,1525.57,472.98,328.90,219.86,206.65,449.69,249.66,228.19
qinghai,1654.69,437.77,258.78,303.00,244.93,479.53,288.56,236.51
ningxia,1375.46,480.89,273.84,317.32,251.08,424.75,228.73,195.93
xinjiang,1608.82,536.05,432.46,235.82,250.28,541.30,344.85,214.40

Originally the first one of the data is Chinese, but because the Chinese reading needs decoding, out of some problems, simply changed to the city name of pinyin, each row is a city of data

Then save the City.txt file to the path folder. This folder is set according to the editing software, I use the Spyder, and then set up a project, the City.txt text

The item was tested in the project catalogue.

Then enter the program in the project:

‘‘‘
Created on Wed Jul 05 09:13:43 2017
Author:gxton
Email: [Email protected]
Jiaotashidi Qiuzhenwushi
‘‘‘
#

Import NumPy as NP #要用k-means algorithm that needs to be imported NumPy
From Sklearn.cluster import Kmeans #只导入一部分,


def loaddata (FilePath): #创建一个读取数据的函数
FR = Open (FilePath, ' r+ ') #这里是去读
lines = Fr.readlines () #.read () reads the entire file every time, and is typically used to place the contents of a file in a string variable

#.readlines () reads the entire file at once (similar to. RESD ())

                                             #.readline () reading only one line at a time is usually much slower than. ReadLines (). Use it only if there is not enough memory.
       retdata = []   #用于存储城市的各项消费信息
Retcityname = []   #用于存储城市名称
for line in Lines:
items = Line.strip (). Split (",")
Retcityname.append (items[0])
Retdata.append ([Float (items[i]) for I in range (1,len (items))])
return retdata,retcityname     #返回值: Returns the name of the city and the information about the city's consumption.


If __name__ = = ' __main__ ':     #这里相当于主函数
    Data,cityname = LoadData (' city.txt ')   #利用loadData方法读取数据, loading data
   km = Kmeans (n_clusters=4)   & nbsp         #创建实例, create the K-means algorithm, where all are divided into 4 groups;

#调用k-means method Required Parameters: N_clusters for specifying the number of cluster centers

#init, the initialization method of the initial cluster center

#max_iter, the maximum number of iterations

                                                                #一般调用时只用给出n_cluste RS, init default is k-means++,max_iter default is
   label = km.fit_predict (data)         # Call the Kmeans () fit_predict () method for the calculation,

#作用是计算簇中心以及为为簇分配符号, Label: The label that the data belongs to after clustering.
Expenses = Np.sum (Km.cluster_centers_,axis=1) #axis按行求和
#print (expenses)
Citycluster = [[],[],[],[]]
For I in range (len (cityname)): #将城市按照label分成设定的簇
Citycluster[label[i]].append (Cityname[i]) #将每个簇的城市输出
For I in range (len (citycluster)):
Print ("expenses:%.2f"% expenses[i]) #将每个簇的平均花费输出
Print (Citycluster[i])

Click to run, you can come out results.

Where the N_clusters class, the consumption level of similar cities gathered in a class

Expense: The numerical plus of the central point of the cluster, that is, the average consumption level

Implementation process:

1, establish the project, import Sklearn related package

Import NumPy as NP

From Sklearn.cluster import Kmeans

Software--machine learning and Python, clustering, K--means

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.