KNN algorithm, hence the name Incredibles, K nearest Neighbor value classification algorithm. A typical lazy man algorithm in supervised learning, by calculating the distance from all the predicted samples to the learning sample, selecting the K minimum values to be added to the sample group, the number of samples in the sample group is subordinate to that category, then we predict that our prediction sample belongs to this type.
Learning Source A PDF (someone else's study notes):
fourth. KNN (k nearest Neighbor classification algorithm) nearest neighbor classification algorithm) nearest nearest classification algorithm) nearest neighbor classification algorithm) nearest neighbor classification algorithm) nearest neighbor classification algorithm)1algorithm thinking by calculating each training sample to the distance to be classified, taking and the most recent k training samples, which category of k samples in the majority of training cases, then the point is the core idea: if a sample in the K most adjacent samples in the feature space belongs to a category, The sample also falls into this category and has the characteristics of the sample on this category. This method determines the category to which the sample is to be divided, depending on the category of one or more adjacent samples in determining the classification decision. The KNN method is only associated with a very small number of adjacent samples when deciding on a class. The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class. 2. Algorithm Description 1. Calculation distance: Given the test object, it is calculated with each formula in the training set to calculate Item with D1, D2 ..., Dj similarity. Get Sim (item, D1), SIM (item, D2) ..., SIM (item, DJ). 2. Sort the SIM (item, D1), SIM (item, D2) ..., SIM (item, DJ), and then place the neighbor case set nn if exceeding the similarity threshold T. Neighbor: Locate the nearest K training object as the nearest neighbor of the test object 3The first k names are removed from the neighbor case set NN, depending on the majority, the item may be classified. Classification: According to the main category of K nearest neighbor attribution, to classify the test object3. Algorithm steps? step.1---The initialization distance is the maximum initialization distance is the maximum value? step.2---Calculate the unknown sample and the distance of each training to calculate the unknown sample and the distance of each training dist? step.3---get the current large distance of K nearest sample Maxdist step.4---If dist is less than maxdist, the training sample is used as K-Nearest Neighbor sample? step.5---Repeat steps 2, 3, 4until the unknown sample and all training distances are counted? step.6---statistics K-The number of occurrences of each class label in the nearest neighbor sample? step.7---Select the most frequently occurring class designator as an unknown sample the most frequent class label as an unknown sample the algorithm involves 3 main factors: training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measurement training set, Distance or similar measure training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measurement training set, distance or similar measure K size. The size. The size. 4. K Proximity model three basic features proximity model three basic features proximity model three basic features proximity model three basic features proximity model three basic features proximity model three basic features proximity model three basic features proximity model three basic elements three basic features for distance metric, distance metric, distance metric, distance The selection of measurement, K value and classification decision rule values and classification decision rule value selection and classification decision rule value selection and classification decision rule value selection and classification decision rule value selection and classification decision rule value selection and classification decision rule values selection and classification decision rules Value selection and classification decision rule distance measurement: Set feature space χ is???? of n-dimensional real vector space,????,???? ∈??,????=(???? (1),???? (2),...,???? (??))??, ???? =(???? (1),???? (2),...,???? (??))??????,???? of???? The distance is defined as:???? (????,????)= (σ|???? (??)????? (??)|?????? =1) 1/?? P≥1P=2 when the European distance:?? 2 (????,????) = (σ|???? (??)????? (??)| 2???? =1) 1/2P=1 is the Manhattan distance:?? 1 (????,????) =σ|???? (??)????? (??)|???? =1PWhen =∞, it is the maximum value of each coordinate distance?? ∞ (????,????) =max??|???? (??)????? (??)| 5 algorithm advantages and disadvantages algorithm advantages and disadvantages algorithm advantages and disadvantages1) Advantages? Simple, easy to understand implementation without estimating parameter training;? Suitable for classification problems with a larger sample size? Especially suitable for multi-classification problems (multi-modal, the object has multiple categories of labels), for example, according to genetic characteristics to determine its functional classification, KNN is better than SVM performance2) Disadvantages? Lazy algorithm, the measurement of the test sample classification of large memory overhead evaluation slow; Poor explanatory, unable to give a decision tree-like rule? For a classification problem with a smaller sample size, a 6 error is generated. Frequently Asked Questions1K value is set to how large K is too small, the classification results are susceptible to noise points, K is too large, the nearest neighbor may contain many other categories of points. (The distance is too large, the nearest neighbor may contain many other categories of points.) (For distance weighting, you can reduce the effect of K-value setting) k value is usually determined by cross-examination cross-Test cross-examination (K=1 as a benchmark) rule of thumb: K generally less than the square root of the training sample number is generally lower than the square root of the training sample number is generally lower than the square root of the training sample number of the square root generally lower than the training sample number of the square root generally lower than the training sample number of square root generally lower than the number of training samples the square root generally lower than the number of training samples is usually lower than the square root of the number of training samples generally lower than the square root of the number of training samples2category How to determine the most appropriate voting method does not take into account the distance of the nearest neighbor, more probably should decide the final classification so weighted more appropriate. 3How to choose the right distance to measure the impact of high dimensions on distance measurement: It is well known that the more the variable, the European differentiation ability is poor. The effect of variable value range on distance: The larger is often the dominant role in computation, so standardization should be done first. 4If training samples are to be treated equally in the training set, some samples may be more worthy of reliance. Weights can be applied to different samples, relying strongly on reducing the impact of the letter. 5) Performance Problem KNN is a lazy algorithm, usually not good to study the exam (test sample is a lazy algorithm, usually not good to study test (to test sample classification) only cramming (to find k nearest neighbor). Adjacent). The consequence of laziness: it is simple to construct a model, but it is expensive to classify the test samples by the overhead of scanning all training and calculating distances. There are a number of ways to improve the efficiency of calculations, such as compressing training samples. 6can you drastically reduce the amount of training samples while maintaining classification accuracy? Enrichment Technology (condensing) editing technology (editing)6.KNN algorithm Python Implementation Example movie classification implementation example of movie classification implementation example of Film classification implementation example of Film classification implementation examples of film classification implementation example of the movie class film Classification of the case movie category the number of fights kissing number movie type California man 3 104 Romance He ' s not really into Dudes 2 romance Beautiful Woman 1 Bayi Romance Kevin Longblade 101 Action Robo Slay Er 5 action Amped II 98 2 Action unknown 18 90Unknown Task Description: Define the movie type by the number of fights and kisses to call Python's Sklearn module solver1.ImportNumPy as NP 2. fromSklearnImportNeighbors 3. KNN = neighbors. Kneighborsclassifier ()#get KNN classifier 4. Data = Np.array ([[3,104],[2,100],[1,81],[101,10],[99,5],[98,2]]) # <span style= "Font-family:arial, Helvetica, Sans-serif; " >data corresponds to the number of fights and kisses </span> 5. Labels = Np.array ([1,1,1,2,2,2]) #<span style= "font-family:arial, Helvetica, Sans-serif;" >labels is the corresponding romance and action</span> 6. Knn.fit (data,labels) #导入数据进行训练 ' 7. #Out: Kneighborsclassifier (algorithm= ' auto ', leaf_size=30, metric= ' Minkowski ', 8. Metric_params=none, N_jobs=1, N_ Neighbors=5, p=2, 9. weights= ' uniform ') 10. Knn.predict ([18,90])Description: First, use 1 and 2 of the array in the labels array to represent romance and Aciton, because Sklearn does not accept the character array as a flag, only the character array is not accepted as a flag, only with theThis type of int data indicates that the later processing can represent the type data, and later processing can map 1 and 2 to romance and action. Fit is trained with data and labels, and data corresponds to the number of fights and the composition of the kissing vectors, called features. Labels is the type of film that this data represents. Calling predict to make predictions and substituting the eigenvector of an unknown movie, the type that the unknown movie belongs to is analyzed. Here the result is calculated as 1, which means that the unknown movie belongs to romance,
Let me water a pitch according to the star score data (make up, lazy to climb, behind can write a series Ah, dig a pit, the back fill), the player is playing the position is the center or defender (yes, I was talking Rondo):
No nonsense, according to the previous four people's data, predict whether I am really big center:
Encounter problems: valueerror:expected n_neighbors <= n_samples, but n_samples = 4, n_neighbors = 5, the number of training samples is less than the dimension, then I add a few samples.
#-*-coding:utf-8-*-#Import Sklearn, there is a KNN framework, you do not have to forget, although I have written a similar recommendation algorithm. fromSklearnImportNeighborsImportNumPy as NP#establishment of KNN model, KNN classifierknn=neighbors. Kneighborsclassifier ()#Import Training SamplesData=np.array ([[25,10,2,0,5],[35,5,10,3,1],[32,15,2,1,5],[21,1,15,1,0],[20,15,1,2,8],[30,10,10,1,8]])#Import Forecast ValuesLocations=np.array (['C','PG','C','PG','C','PG'])#Model TrainingKnn.fit (data,locations)#Make predictionsPrint 'Rondo is%s'%knn.predict ([15,15,10,8, 0]) [0]#Run Results#c:\python27\lib\site-packages\sklearn\utils\validation.py:395:deprecationwarning:passing 1d arrays as data is Deprecated in 0.17 and would raise valueerror in 0.19. Reshape your data either using X.reshape ( -1, 1) If your data have a single feature or X.reshape (1,-1) If it contains a sin GLE Sample. deprecationwarning)
is not very water, yes I am more than the big center, not to fight.
Using KNN algorithm to determine the star's style (very water)