knn-pseudo code and implementation process

Source: Internet
Author: User

KNN Features

Advantages: High precision, unknown sense of outliers, no data input jiading

Cons: High computational complexity, high spatial complexity

Scope of application: Numerical type and nominal type

The pseudo-code of KNN algorithm

1. Calculate the distance between the points in the data set of the known categories and the current

2. Sort by distance increment order

3, select the distance from the current point of the 6, the small K points

4. Determine the frequency of the category in which the first K points are present

5. The category of the highest frequency of the first K points is returned as the predicted classification of the current point

Example: KNN nearest neighbor algorithm improves matching records for dating sites

1. Collect data: Provide text file

* Normalization of data

2, prepare data: Python parsing data: including data entry and call KNN algorithm

3. Analysis data: Use Matplotlib to draw two-bit scatter plot

4, Training algorithm: This step does not apply K-NN algorithm

5, the test algorithm: the use of Helen Tong part of the data as a test sample; the difference between a test sample and a non-test sample is that the test sample is the data that has been sorted, and if the forecast classification differs from the actual category, the error is flagged.

* Normalized processing characteristics (characteristic values that can be used in the transformation classification)

* Calculate the number of test vectors (which determine which data to test, which is used for training samples), and then enter into the original KNN classifier function classfy0, and finally calculate the error rate, and output

6, using the algorithm: Generate Raw_input, input some of the obtained special value, generate two-bit scatter plot, and use color marking method to process

7, note: Related to the numerical normalization problem, KNN's disadvantage is unable to give any data infrastructure information, new words we do not know what the average sample sample and typical instance samples have any characteristics, but the probability measurement method can deal with classification problems.

Handwriting System Examples:

1. Collect Data: Extract files

2, prepare the data: Write the function classfy0 (), convert the picture to the list format used by the classifier

* Convert binary graphic proofs to 1*1024 vectors

* Open the file, loop the first 32 lines of the file, and store the first 32 character values of each line in NumPy

3. Analyze data: Check the data at the Python command prompt to ensure that the requirements are met

* Get directory Content

* Parse digital classification from filename, mainly match the vector similarity of training set

4. Training data: KNN not suitable

5, test data: Write the function to use the provided part of the data and as a test sample, the difference between the test sample and the cost test sample is that the test sample is completed into the class data, if the forecast classification and the actual category is different, it is marked as an error

6, using the algorithm:

knn-pseudo code and implementation process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.