Software--machine learning and Python, clustering, K--means

Last Update:2017-07-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K-means is a clustering algorithm:

Here, we use K-means to classify 31 cities.

The city's data is stored in the City.txt file, which reads as follows:

bj,2959.19,730.79,749.41,513.34,467.87,1141.82,478.42,457.64
tianjin,2459.77,495.47,697.33,302.87,284.19,735.97,570.84,305.08
hebei,1495.63,515.90,362.37,285.32,272.95,540.58,364.91,188.63
shanxi,1406.33,477.77,290.15,208.57,201.50,414.72,281.84,212.10
nmg,1303.97,524.29,254.83,192.17,249.81,463.09,287.87,192.96
liaoning,1730.84,553.90,246.91,279.81,239.18,445.20,330.24,163.86
jilin,1561.86,492.42,200.49,218.36,220.69,459.62,360.48,147.76
hlj,1410.11,510.71,211.88,277.11,224.65,376.82,317.61,152.85
shanghai,3712.31,550.74,893.37,346.93,527.00,1034.98,720.33,462.03
jiangsu,2207.58,449.37,572.40,211.92,302.09,585.23,429.77,252.54
zhejiang,2629.16,557.32,689.73,435.69,514.66,795.87,575.76,323.36
anhui,1844.78,430.29,271.28,126.33,250.56,513.18,314.00,151.39
fujian,2709.46,428.11,334.12,160.77,405.14,461.67,535.13,232.29
jiangxi,1563.78,303.65,233.81,107.90,209.70,393.99,509.39,160.12
shandong,1675.75,613.32,550.71,219.79,272.59,599.43,371.62,211.84
henan,1427.65,431.79,288.55,208.14,217.00,337.76,421.31,165.32
hunan,1942.23,512.27,401.39,206.06,321.29,697.22,492.60,226.45
hubei,1783.43,511.88,282.84,201.01,237.60,617.74,523.52,182.52
guangdong,3055.17,353.23,564.56,356.27,811.88,873.06,1082.82,420.81
guangxi,2033.87,300.82,338.65,157.78,329.06,621.74,587.02,218.27
hainan,2057.86,186.44,202.72,171.79,329.65,477.17,312.93,279.19
chongqing,2303.29,589.99,516.21,236.55,403.92,730.05,438.41,225.80
sichuang,1974.28,507.76,344.79,203.21,240.24,575.10,430.36,223.46
guizhou,1673.82,437.75,461.61,153.32,254.66,445.59,346.11,191.48
yunnan,2194.25,537.01,369.07,249.54,290.84,561.91,407.70,330.95
xizang,2646.61,839.70,204.44,209.11,379.30,371.04,269.59,389.33
shanxi,1472.95,390.89,447.95,259.51,230.61,490.90,469.10,191.34
gansu,1525.57,472.98,328.90,219.86,206.65,449.69,249.66,228.19
qinghai,1654.69,437.77,258.78,303.00,244.93,479.53,288.56,236.51
ningxia,1375.46,480.89,273.84,317.32,251.08,424.75,228.73,195.93
xinjiang,1608.82,536.05,432.46,235.82,250.28,541.30,344.85,214.40

Originally the first one of the data is Chinese, but because the Chinese reading needs decoding, out of some problems, simply changed to the city name of pinyin, each row is a city of data

Then save the City.txt file to the path folder. This folder is set according to the editing software, I use the Spyder, and then set up a project, the City.txt text

The item was tested in the project catalogue.

Then enter the program in the project:

‘‘‘
Created on Wed Jul 05 09:13:43 2017
Author:gxton
Email: [Email protected]
Jiaotashidi Qiuzhenwushi
‘‘‘
#

Import NumPy as NP #要用k-means algorithm that needs to be imported NumPy
From Sklearn.cluster import Kmeans #只导入一部分,

def loaddata (FilePath): #创建一个读取数据的函数
FR = Open (FilePath, ' r+ ') #这里是去读
lines = Fr.readlines () #.read () reads the entire file every time, and is typically used to place the contents of a file in a string variable

#.readlines () reads the entire file at once (similar to. RESD ())

#.readline () reading only one line at a time is usually much slower than. ReadLines (). Use it only if there is not enough memory.
retdata = [] #用于存储城市的各项消费信息
Retcityname = [] #用于存储城市名称
for line in Lines:
items = Line.strip (). Split (",")
Retcityname.append (items[0])
Retdata.append ([Float (items[i]) for I in range (1,len (items))])
return retdata,retcityname #返回值: Returns the name of the city and the information about the city's consumption.

If __name__ = = ' __main__ ': #这里相当于主函数
Data,cityname = LoadData (' city.txt ') #利用loadData方法读取数据, loading data
km = Kmeans (n_clusters=4) & nbsp #创建实例, create the K-means algorithm, where all are divided into 4 groups;

#调用k-means method Required Parameters: N_clusters for specifying the number of cluster centers

#init, the initialization method of the initial cluster center

#max_iter, the maximum number of iterations

#一般调用时只用给出n_cluste RS, init default is k-means++,max_iter default is
label = km.fit_predict (data) # Call the Kmeans () fit_predict () method for the calculation,

#作用是计算簇中心以及为为簇分配符号, Label: The label that the data belongs to after clustering.
Expenses = Np.sum (Km.cluster_centers_,axis=1) #axis按行求和
#print (expenses)
Citycluster = [[],[],[],[]]
For I in range (len (cityname)): #将城市按照label分成设定的簇
Citycluster[label[i]].append (Cityname[i]) #将每个簇的城市输出
For I in range (len (citycluster)):
Print ("expenses:%.2f"% expenses[i]) #将每个簇的平均花费输出
Print (Citycluster[i])

Click to run, you can come out results.

Where the N_clusters class, the consumption level of similar cities gathered in a class

Expense: The numerical plus of the central point of the cluster, that is, the average consumption level

Implementation process:

1, establish the project, import Sklearn related package

Import NumPy as NP

From Sklearn.cluster import Kmeans

Software--machine learning and Python, clustering, K--means

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Software--machine learning and Python, clustering, K--means

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Software--machine learning and Python, clustering, K--means

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support