K-means is a clustering algorithm:
Here, we use K-means to classify 31 cities.
The city's data is stored in the City.txt file, which reads as follows:
bj,2959.19,730.79,749.41,513.34,467.87,1141.82,478.42,457.64
tianjin,2459.77,495.47,697.33,302.87,284.19,735.97,570.84,305.08
hebei,1495.63,515.90,362.37,285.32,272.95,540.58,364.91,188.63
shanxi,1406.33,477.77,290.15,208.57,201.50,414.72,281.84,212.10
nmg,1303.97,524.29,254.83,192.17,249.81,463.09,287.87,192.96
liaoning,1730.84,553.90,246.91,279.81,239.18,445.20,330.24,163.86
jilin,1561.86,492.42,200.49,218.36,220.69,459.62,360.48,147.76
hlj,1410.11,510.71,211.88,277.11,224.65,376.82,317.61,152.85
shanghai,3712.31,550.74,893.37,346.93,527.00,1034.98,720.33,462.03
jiangsu,2207.58,449.37,572.40,211.92,302.09,585.23,429.77,252.54
zhejiang,2629.16,557.32,689.73,435.69,514.66,795.87,575.76,323.36
anhui,1844.78,430.29,271.28,126.33,250.56,513.18,314.00,151.39
fujian,2709.46,428.11,334.12,160.77,405.14,461.67,535.13,232.29
jiangxi,1563.78,303.65,233.81,107.90,209.70,393.99,509.39,160.12
shandong,1675.75,613.32,550.71,219.79,272.59,599.43,371.62,211.84
henan,1427.65,431.79,288.55,208.14,217.00,337.76,421.31,165.32
hunan,1942.23,512.27,401.39,206.06,321.29,697.22,492.60,226.45
hubei,1783.43,511.88,282.84,201.01,237.60,617.74,523.52,182.52
guangdong,3055.17,353.23,564.56,356.27,811.88,873.06,1082.82,420.81
guangxi,2033.87,300.82,338.65,157.78,329.06,621.74,587.02,218.27
hainan,2057.86,186.44,202.72,171.79,329.65,477.17,312.93,279.19
chongqing,2303.29,589.99,516.21,236.55,403.92,730.05,438.41,225.80
sichuang,1974.28,507.76,344.79,203.21,240.24,575.10,430.36,223.46
guizhou,1673.82,437.75,461.61,153.32,254.66,445.59,346.11,191.48
yunnan,2194.25,537.01,369.07,249.54,290.84,561.91,407.70,330.95
xizang,2646.61,839.70,204.44,209.11,379.30,371.04,269.59,389.33
shanxi,1472.95,390.89,447.95,259.51,230.61,490.90,469.10,191.34
gansu,1525.57,472.98,328.90,219.86,206.65,449.69,249.66,228.19
qinghai,1654.69,437.77,258.78,303.00,244.93,479.53,288.56,236.51
ningxia,1375.46,480.89,273.84,317.32,251.08,424.75,228.73,195.93
xinjiang,1608.82,536.05,432.46,235.82,250.28,541.30,344.85,214.40
Originally the first one of the data is Chinese, but because the Chinese reading needs decoding, out of some problems, simply changed to the city name of pinyin, each row is a city of data
Then save the City.txt file to the path folder. This folder is set according to the editing software, I use the Spyder, and then set up a project, the City.txt text
The item was tested in the project catalogue.
Then enter the program in the project:
‘‘‘
Created on Wed Jul 05 09:13:43 2017
Author:gxton
Email: [Email protected]
Jiaotashidi Qiuzhenwushi
‘‘‘
#
Import NumPy as NP #要用k-means algorithm that needs to be imported NumPy
From Sklearn.cluster import Kmeans #只导入一部分,
def loaddata (FilePath): #创建一个读取数据的函数
FR = Open (FilePath, ' r+ ') #这里是去读
lines = Fr.readlines () #.read () reads the entire file every time, and is typically used to place the contents of a file in a string variable
#.readlines () reads the entire file at once (similar to. RESD ())
#.readline () reading only one line at a time is usually much slower than. ReadLines (). Use it only if there is not enough memory.
retdata = [] #用于存储城市的各项消费信息
Retcityname = [] #用于存储城市名称
for line in Lines:
items = Line.strip (). Split (",")
Retcityname.append (items[0])
Retdata.append ([Float (items[i]) for I in range (1,len (items))])
return retdata,retcityname #返回值: Returns the name of the city and the information about the city's consumption.
If __name__ = = ' __main__ ': #这里相当于主函数
Data,cityname = LoadData (' city.txt ') #利用loadData方法读取数据, loading data
km = Kmeans (n_clusters=4) & nbsp #创建实例, create the K-means algorithm, where all are divided into 4 groups;
#调用k-means method Required Parameters: N_clusters for specifying the number of cluster centers
#init, the initialization method of the initial cluster center
#max_iter, the maximum number of iterations
#一般调用时只用给出n_cluste RS, init default is k-means++,max_iter default is
label = km.fit_predict (data) # Call the Kmeans () fit_predict () method for the calculation,
#作用是计算簇中心以及为为簇分配符号, Label: The label that the data belongs to after clustering.
Expenses = Np.sum (Km.cluster_centers_,axis=1) #axis按行求和
#print (expenses)
Citycluster = [[],[],[],[]]
For I in range (len (cityname)): #将城市按照label分成设定的簇
Citycluster[label[i]].append (Cityname[i]) #将每个簇的城市输出
For I in range (len (citycluster)):
Print ("expenses:%.2f"% expenses[i]) #将每个簇的平均花费输出
Print (Citycluster[i])
Click to run, you can come out results.
Where the N_clusters class, the consumption level of similar cities gathered in a class
Expense: The numerical plus of the central point of the cluster, that is, the average consumption level
Implementation process:
1, establish the project, import Sklearn related package
Import NumPy as NP
From Sklearn.cluster import Kmeans
Software--machine learning and Python, clustering, K--means