KNN Features
Advantages: High precision, unknown sense of outliers, no data input jiading
Cons: High computational complexity, high spatial complexity
Scope of application: Numerical type and nominal type
The pseudo-code of KNN algorithm
1. Calculate the distance between the points in the data set of the known categories and the current
2. Sort by distance increment order
3, select the distance from the current point of the 6, the small K points
4. Determine the frequency of the category in which the first K points are present
5. The category of the highest frequency of the first K points is returned as the predicted classification of the current point
Example: KNN nearest neighbor algorithm improves matching records for dating sites
1. Collect data: Provide text file
* Normalization of data
2, prepare data: Python parsing data: including data entry and call KNN algorithm
3. Analysis data: Use Matplotlib to draw two-bit scatter plot
4, Training algorithm: This step does not apply K-NN algorithm
5, the test algorithm: the use of Helen Tong part of the data as a test sample; the difference between a test sample and a non-test sample is that the test sample is the data that has been sorted, and if the forecast classification differs from the actual category, the error is flagged.
* Normalized processing characteristics (characteristic values that can be used in the transformation classification)
* Calculate the number of test vectors (which determine which data to test, which is used for training samples), and then enter into the original KNN classifier function classfy0, and finally calculate the error rate, and output
6, using the algorithm: Generate Raw_input, input some of the obtained special value, generate two-bit scatter plot, and use color marking method to process
7, note: Related to the numerical normalization problem, KNN's disadvantage is unable to give any data infrastructure information, new words we do not know what the average sample sample and typical instance samples have any characteristics, but the probability measurement method can deal with classification problems.
Handwriting System Examples:
1. Collect Data: Extract files
2, prepare the data: Write the function classfy0 (), convert the picture to the list format used by the classifier
* Convert binary graphic proofs to 1*1024 vectors
* Open the file, loop the first 32 lines of the file, and store the first 32 character values of each line in NumPy
3. Analyze data: Check the data at the Python command prompt to ensure that the requirements are met
* Get directory Content
* Parse digital classification from filename, mainly match the vector similarity of training set
4. Training data: KNN not suitable
5, test data: Write the function to use the provided part of the data and as a test sample, the difference between the test sample and the cost test sample is that the test sample is completed into the class data, if the forecast classification and the actual category is different, it is marked as an error
6, using the algorithm:
knn-pseudo code and implementation process