Python KMeans clustering problem analysis, kmeans Clustering
Today, python is used to implement simple cluster analysis. By the way, I am familiar with some numpy Array Operations and plotting techniques. Here I will record it.
From pylab import * from sklearn. cluster import KMeans # Use numpy. the append () function is used to merge multi-dimensional arrays in matlab. If the axis parameter value is 0, the y axis is merged. If the parameter value is 1, the x axis is merged, correspond to the effects of matlab [A; B] and [A, B] # create five random datasets x1 = append (randn () + 5, randn) + 5, axis = 1) x2 = append (randn (500,1) + 5, randn (500,1)-5, axis = 1) x3 = append (randn (500,1)-5, randn () + 5, axis = 1) x4 = append (randn ()-5, randn ()-5, axis = 1) x5 = append (randn (), randn (), ax Is = 1) # The following uses a stupid method to merge five datasets into an array (, 2) of the size datadata = append (x1, x2, axis = 0) data = append (data, x3, axis = 0) data = append (data, x4, axis = 0) data = append (data, x5, axis = 0) plot (x1 [:, 0], x1 [:, 1], 'Oc ', markersize = 0.8) plot (x2 [:, 0], x2 [:, 1], 'og ', markersize = 0.8) plot (x3 [:, 0], x3 [:, 1], 'ob', markersize = 0.8) plot (x4 [:, 0], x4 [:, 1], 'om ', markersize = 0.8) plot (x5 [:, 0], x5 [:, 1], 'oy ', markersize = 0.8) k = KMeans (n_clusters = 5, random_state = 0 ). fit (data) T = k. cluster_centers _ # obtain the data center point (t [:, 0], t [:, 1], 'r * ', markersize = 16) # display these five centers, pentagram tag ~ Title ('kmeans clustering') box (False) xticks ([]) # Remove the axis mark yticks ([]) show ()
The result is as follows:
Update
Today, I re-run the program error, prompted to import NUMPY_MKL failed, because the previous command pip install-U numpy manually updated numpy, initially in http://www.lfd.uci.edu /~ Gohlke/pythonlibs/# numpy download the numpy-1.11.2 + mkl-cp27-cp27m-win_amd64.whl file installed, as long as you reinstall it back on it
Update
There is also a package named plotly in python. You can use pip install plotly or pip3 install plotly (Python3.X) to draw exquisite images using this package. There are many examples on the official website, at the same time, plotly also supports matlab, R, and so on, but I personally think that the plot Syntax of plotly is more complex than matplotlib, And it is convenient to modify it according to the routine, however, if you only want to make better data visualization, you can refer to the routine on the official website and make modifications. Below is a sample code from the official website:
Import plotly. plotly as pyimport plotly. graph_objs as goimport plotlyimport numpy as np # generate three sets of Gaussian Distribution (Gaussian Distribution) points set x0 = np. random. normal (2, 0.45, 300) y0 = np. random. normal (2, 0.45, 300) x1 = np. random. normal (6, 0.8, 200) y1 = np. random. normal (6, 0.8, 200) x2 = np. random. normal (4, 0.3, 200) y2 = np. random. normal (4, 0.3, 200) # create the graph objecttrace0 = go. scatter (x = x0, y = y0, mode = 'markers',) trace1 = go. scatter (x = x1, y = y1, mode = 'markers') trace2 = go. scatter (x = x2, y = y2, mode = 'markers') trace3 = go. scatter (x = x1, y = y0, mode = 'markers') # The layout is a dictionary. The dictionary keywords keys include: 'shapes ', 'showlegend' layout = {'shapes': [{'type': 'circle', 'xref ': 'X', 'yref': 'y', 'x0 ': min (x0), 'y0': min (y0), 'x1 ': max (x0), 'y1': max (y0), 'opacity ': 0.2, 'fillcolor': 'blue', 'line': {'color': 'blue',},}, {'type': 'circle', 'xref ': 'X', 'yref ': 'y', 'x0': min (x1), 'y0': min (y1), 'x1': max (x1 ), 'y1 ': max (y1), 'opacity': 0.2, 'fillcolor': 'Orange ', 'line': {'color': 'Orange ',},}, {'type': 'circle', 'xref ': 'X', 'yref': 'y', 'x0': min (x2), 'y0 ': min (y2), 'x1 ': max (x2), 'y1': max (y2), 'opacity ': 0.2, 'fillcolor': 'green ', 'line': {'color': 'green',},}, {'type': 'circle', 'xref ': 'X', 'yref ': 'y', 'x0': min (x1), 'y0': min (y0), 'x1 ': max (x1), 'y1': max (y0 ), 'opacity ': 0.2, 'fillcolor': 'red', 'line': {'color': 'red' ,},], 'showlegend': False ,} data = [trace0, trace1, trace2, trace3] # image parts and layout section fig = {'data': data, 'layout ': layout ,} # Use the offline method to draw images. Because you have not registered an official website and the website is not easy to use, use the offline method to draw plotly. offline. plot (fig, filename = 'clusters ')
The result is that the image is opened in the browser and saved locally, for example:
Summary:Although the syntax of the plotly library is cumbersome, it can be fully utilized when there is a high requirement on data display. matplotlib is convenient for general plotting, in ipython mode, execute from pylab import * to obtain a work environment similar to MATLAB.
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.