"Sklearn series" KNN algorithm

Last Update:2018-01-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recent Neighborhood Classification Concept explained

We are using the neighbors in the Scikit-learn library. Kneighborsclassifier to implement KNN.

fromimport neighborsneighbors.KNeighborsClassifier(n_neighbors=5, weights=‘uniform‘, algorithm=‘auto‘, leaf_size=30,p=2, metric=’minkowski’, metric_params=None, n_jobs=1)

N_neighbors is used to determine the K value in most voting rules, that is, to select the most general range of K values around a point

Weights: This parameter is very interesting, its role is in the classification to judge the point of the nearest neighbor to add weight, its default value is ' uniform ', that is, equal weight, so in this case we can use the majority of voting rules to determine the type of input instance prediction. There is also a choice of ' distance ', which is given by the reciprocal weight of the distance. In this case, the category of the closest point to the input instance is more persuasive than the other point category. For example, if the three data points closest to the enquiry point are 1 Class A and 2 B classes, and assume that Class A is very close to the inquiry point, and that the two Class B distances are slightly farther away. In the weighted weighting, K (3) NN will determine the problem point is class B, and if the use of distance weighting, then a class has a higher weight (because more recent), if its weight is higher than the sum of the weights of the two Class B (category in the majority of voting rules use number, here only need to be greater than the B-class weight and Then the algorithm will determine the problem point is a class. The options for the weight feature should vary depending on the scene being applied. There is one final situation where the user sets the weight setting method.

Algorithm is a classification of the algorithm taken, there is {' auto ', ' ball_tree ', ' kd_tree ', ' Brute '}, in general, select Auto can be, it will automatically select the most appropriate algorithm.

P: In the Machine learning series, when we know p=1, the distance method is defined as the Manhattan distance, and in p=2 we are determined to be Euclidean distances. The default value is 2.

Next, we will make fit () Fit function, generate a KNN model.

knn=KNeighborsClassifier()knn.fit(X,y)

where x is an array form (as explained in the following example), each set of data in X can be a tuple or a list or a one-dimensional array, but be aware that all data must be the same length (equivalent to the number of features). This is a very important point. We can think of X as a matrix form, and each row represents the characteristic data of an input instance.

Y is a list or a one-dimensional array with the same length as x, where each element is a category label for the corresponding data in X.

The next step is to make predictions:

knn.predict(X)

Enter x an array here, in the form similar to (if it is a two-dimensional feature): [[0,1], [2,1] ...]

Rough forecasts

knn.predict_proba(X)

The output is an array form, and each element represents the probability that the input instance belongs to this class. The order of the categories corresponding to the array is referenced here according to the size comparison order in Y. Of course, if your input instance is more than one, then the output will become the corresponding [[P1,P2],[P3,P4] ...]

Correct rate Score

neighbors.KNeighborsClassifier.score(X, y, sample_weight=None)

We typically divide our training datasets into two categories, one for learning and training models, and one for testing, and this kinetic energy is the ability to test after learning to see the accuracy.

Practical examples

First we take the example of film splitting in the KNN algorithm in the Machine learning series. We implemented a KNN classifier in that series, taking the Euclidean distance, here we directly use the function in the Sklearn library to achieve the KNN algorithm, we can refer to both.

ImportNumPy asNpImportSklearn fromSklearnImportDatasets fromSklearn.neighborsImportKneighborsclassifierx_train=Np.array ([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])#这里是数组形式哦, pay attention to Oh, if the input dataframe (because generally we import the file is the use of CSV mode, import in general is to form Dataframe mode, we need to use in the Fit () function X_train.values,y_ Train.values)Y_train=[' A ',' A ',' B ',' B ']knn=Kneighborsclassifier (n_neighbors=1) Knn.fit (X_train,y_train) knn.predict ([[5,0],[4,0]])#要注意哦, the time to predict is also used in the array form

"Sklearn series" KNN algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Sklearn series" KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Sklearn series" KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support