Today, I found that machinelearning's application threshold has been reduced to such a low, almost easy. I really can't find any reason not to get into the deep understanding of it. As a headline, thank Google for its contribution to this technology development. Of course, maybe someone else did the 99%, Google did only 1%, I want to say, really beautiful 1%.
Cut to the chase, today's engineers from Youtube have completed the first machinelearning applet. As a Hello world to learn this skill.
is for the record.
1 fromScipy.spatialImportDistance2 defEUC (A, b):3 returndistance. (A, B)4 5 classknnclassifier ():6 defFit (self, X_train, Y_train):7Self.x_train =X_train8Self.y_train =Y_train9 Ten defPredict (self, x_test): Onepredictions = [] A forRowinchx_test: -Label =self.closest (Row) - predictions.append (label) the returnpredictions - - defclosest (self, row): -Best_dist =EUC (Row, self.x_train[0]) +Best_index =0 - forIinchRange (1, Len (self.x_train)): +Dist =EUC (Row, self.x_train[i]) A ifDist <best_dist: atBest_dist =Dist -Best_index =I - returnSelf.y_train[best_index] - - fromSklearnImportDatasets -Iris =Datasets.load_iris () inx =Iris.data -y =Iris.target to + fromSklearn.cross_validationImportTrain_test_split -X_train, X_test, y_train, y_test = Train_test_split (x, Y, test_size=. 5) the PrintX_train * PrintY_train $ Panax NotoginsengMy_classifier =Knnclassifier () - My_classifier.fit (X_train, Y_train) thepredictions =my_classifier.predict (x_test) + A fromSklearn.metricsImportAccuracy_score the PrintAccuracy_score (y_test, predictions)
A simple explanation of the above code:
1.1-3 rows are functions that reference the Euclidean distance in the distance class of scipy and are simply encapsulated. (Euclidean distance: The true distance between two points in N-dimensional space)
2.5-25, the classifier class is defined, and the key methods include fit and predict. Fit mainly assigns the incoming data to the internal variable, and the predict is returned to the expected Label based on the row sent in. The classifier here is our hand code, not the training. In fact, it's not really a machinelearning, but it's a good explanation of its internal principles. in machinelearning, we define the closet function that will be trained to, that is, model.
3.27-30, in the database of iris flowers in the Sklearn library, as a source of data for our later experiments. Iris_data is the raw data of three flowers and is a three-dimensional array. Each element in the array represents the three parameters of a flower, the length of the XX, the xx width of the flower, and the length of XX (I do not relate to what data he is, but the data of the flowers); Iris_target is the type of flower that corresponds to data, which is probably 0 for red roses, 1 for blue roses, 2 means pink roses and the like.
4.32-35, the loaded flowers data split into two groups, a group used as train, as a predictive credential, and another set as the test classifier accuracy of the data to be measured. Validation, because the validation group of data corresponding to the results are also known, so take classifier out of the results and the real value of the comparison, we know classifier is reasonable. Using the above code to determine the success rate has reached >90%, in fact, take it to determine the unknown new data, the result is very high credibility.
5.37-39, the application of the classifier defined in 2, 4 is divided into the x_train, Y_train feed to classifier. Then, using classifier according to the flower data in x_test, predict the type of flower, get the corresponding prediction result array predictions.
6.41-42, compare the actual flower species y_test with the predicted results of predictions between the degree of conformity. Can see is not 100%, information will always be missing, even if the human eye to judge the same.
Because the loaded data is random at split. Therefore, because the train group and the test group of data are different, the accuracy of the predictions will be slightly different.
Although the classifier here has a high degree of accuracy, however, it is not possible to avoid this calculation, the computational capacity is very large. At the same time, because of the relationship of our data attributes, we can predict directly by finding the closest data, in some other applications, some properties are not linearly distributed, or not by the human eye can find the law. At this time, the real train is needed.
Learn machinelearning with Google [1]