Python machine learning and practice Coding unsupervised learning classical model data clustering and feature reduction

Source: Internet
Author: User

Unsupervised learning: Focus on discovering the distribution characteristics of the data itself (no need to tag data) save a lot of human data scale is limitless

1 Discovery Data Community data clustering can also look for outlier samples

2 features reduced dimension preserving data with differentiated low-dimensional features

These are very useful techniques in mass data processing.

Data clustering

K-Means algorithm (the number of preset clusters is constantly updating the cluster center iteration, which is the sum of the squares of all data points to their cluster centers and tends to stabilize)

Process

① first randomly lays out the points in the K-proof space as the initial cluster center

The ② then looks for the nearest one from the K Cluster Center for the extra-long vectors based on each data and marks the data as subordinate with this cluster center

③ then, after all the data has been labeled, the cluster centers are re-calculated based on the newly allocated clusters of these data.

④ If a round down all data dependent cluster centers with the last allocated class cluster does not change then the iteration can stop or return to ② to continue the loop

Example of using the K-mans algorithm on handwritten digital image data

ImportNumPy as NPImportMatplotlib.pyplot as PltImportPandas as PD fromSklearn.clusterImportKmeans#use Panda to read training datasets and test data setsDigits_train = Pd.read_csv ('Https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra', header=None) Digits_test=pd.read_csv ('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes', header=None)#64-dimensional pixel features and 1-dimensional digital targets are separated from the training and test data setsX_train=digits_train[np.arange (64)]#Np.arangeY_train=digits_train[64]x_test=digits_test[np.arange (64)]y_test=digits_test[64]#Initialize the Kmeans model and set the number of cluster centers to tenKmeans=kmeans (n_clusters=10) Kmeans.fit (x_train,y_train) y_predict=kmeans.predict (x_test)#K-means Clustering Performance assessment using Ari fromSklearnImportMetricsPrint(Metrics.adjusted_rand_score (y_test,y_predict))

Performance evaluation:

① is used to evaluate the data itself with the correct category information using the ARI Ari indicator is similar to the method of calculating accuracy in the classification problem, while also taking into account the problem that the cluster cannot match the classification mark one by one

② if the data being used for evaluation does not have a category, then we are accustomed to using contour coefficients to measure the quality of the clustering results. The contour factor also takes into account the aggregation degree and the degree of separation of the cluster.

Used to evaluate the effect of clustering and take a range of values [ -1,1]. The larger the value of the contour system, the better the clustering effect.

Python machine learning and practice Coding unsupervised learning classical model data clustering and feature reduction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.