"Sklearn series" KNN algorithm

Source: Internet
Author: User

Recent Neighborhood Classification Concept explained

We are using the neighbors in the Scikit-learn library. Kneighborsclassifier to implement KNN.

fromimport neighborsneighbors.KNeighborsClassifier(n_neighbors=5, weights=‘uniform‘, algorithm=‘auto‘, leaf_size=30,p=2, metric=’minkowski’, metric_params=None, n_jobs=1)

N_neighbors is used to determine the K value in most voting rules, that is, to select the most general range of K values around a point

Weights: This parameter is very interesting, its role is in the classification to judge the point of the nearest neighbor to add weight, its default value is ' uniform ', that is, equal weight, so in this case we can use the majority of voting rules to determine the type of input instance prediction. There is also a choice of ' distance ', which is given by the reciprocal weight of the distance. In this case, the category of the closest point to the input instance is more persuasive than the other point category. For example, if the three data points closest to the enquiry point are 1 Class A and 2 B classes, and assume that Class A is very close to the inquiry point, and that the two Class B distances are slightly farther away. In the weighted weighting, K (3) NN will determine the problem point is class B, and if the use of distance weighting, then a class has a higher weight (because more recent), if its weight is higher than the sum of the weights of the two Class B (category in the majority of voting rules use number, here only need to be greater than the B-class weight and Then the algorithm will determine the problem point is a class. The options for the weight feature should vary depending on the scene being applied. There is one final situation where the user sets the weight setting method.

Algorithm is a classification of the algorithm taken, there is {' auto ', ' ball_tree ', ' kd_tree ', ' Brute '}, in general, select Auto can be, it will automatically select the most appropriate algorithm.

P: In the Machine learning series, when we know p=1, the distance method is defined as the Manhattan distance, and in p=2 we are determined to be Euclidean distances. The default value is 2.

Next, we will make fit () Fit function, generate a KNN model.

knn=KNeighborsClassifier()knn.fit(X,y)

where x is an array form (as explained in the following example), each set of data in X can be a tuple or a list or a one-dimensional array, but be aware that all data must be the same length (equivalent to the number of features). This is a very important point. We can think of X as a matrix form, and each row represents the characteristic data of an input instance.

Y is a list or a one-dimensional array with the same length as x, where each element is a category label for the corresponding data in X.

The next step is to make predictions:

knn.predict(X)

Enter x an array here, in the form similar to (if it is a two-dimensional feature): [[0,1], [2,1] ...]

Rough forecasts

knn.predict_proba(X)

The output is an array form, and each element represents the probability that the input instance belongs to this class. The order of the categories corresponding to the array is referenced here according to the size comparison order in Y. Of course, if your input instance is more than one, then the output will become the corresponding [[P1,P2],[P3,P4] ...]

Correct rate Score

neighbors.KNeighborsClassifier.score(X, y, sample_weight=None)

We typically divide our training datasets into two categories, one for learning and training models, and one for testing, and this kinetic energy is the ability to test after learning to see the accuracy.

Practical examples

First we take the example of film splitting in the KNN algorithm in the Machine learning series. We implemented a KNN classifier in that series, taking the Euclidean distance, here we directly use the function in the Sklearn library to achieve the KNN algorithm, we can refer to both.

ImportNumPy asNpImportSklearn fromSklearnImportDatasets fromSklearn.neighborsImportKneighborsclassifierx_train=Np.array ([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])#这里是数组形式哦, pay attention to Oh, if the input dataframe (because generally we import the file is the use of CSV mode, import in general is to form Dataframe mode, we need to use in the Fit () function X_train.values,y_ Train.values)Y_train=[' A ',' A ',' B ',' B ']knn=Kneighborsclassifier (n_neighbors=1) Knn.fit (X_train,y_train) knn.predict ([[5,0],[4,0]])#要注意哦, the time to predict is also used in the array form

"Sklearn series" KNN algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.