Using KNN neighbor algorithm to predict data of machine learning

Source: Internet
Author: User
Tags function prototype jupyter notebook

The first half is the introduction, the latter part is the case

KNN Nearest Neighbor algorithm:
Simply put, the method of measuring the distance between different eigenvalues is used to classify (k-nearest NEIGHBOR,KNN)

Advantages: High accuracy, insensitive to outliers, no data input assumptions
Cons: High complexity of time and space

    • 1, when the sample is unbalanced, such as a class of sample capacity is very small, the sample size of other classes, when you enter a sample, the most of the k near values are large sample capacity of that class, this can lead to classification errors. The improved method is to weighted the K-nearest point, that is, the weight of the point near the distance is large, and the point weight of the distance is small.
    • 2, the calculation is large, each sample to be classified to calculate it to the distance of all points, according to the order of distance to find K adjacent points, the improved method is: first to the known sample points for editing, in advance to remove the small sample of the role of the classification.

Applicable data range:

    • Nominal type (discrete type): The result of a nominal target variable is only used in a limited target set, such as true and false (nominal target variable is mainly for classification)
    • Numeric: A numerical target variable can be used to extract values from an infinite set of values, such as 0.100,42.001 (numerical target variables are mainly for regression analysis)

Working principle:

    • There is a collection of sample data, also called a training sample set, in the training sample set, and there is a label for each data in the sample set, that is, we know the correspondence between each data in the sample set and the owning category. After losing new data with no tags, each feature of the new data is compared with the feature in the sample set, and then the algorithm extracts the classification label of the most similar data (nearest neighbor) in the sample set. In general, we only select the first k most similar data in the sample data set, which is the source of K in the K-nearest neighbor algorithm, usually K is an integer not greater than 20. Finally, select the most frequently occurring classification of the K most similar data as the classification of the new data.
    • The movie category KNN analysis (image from the network)

    • Euclidean distance (Euclidean Distance, Euclidean metric)

    • Calculation process Diagram

    • Case
      The code is written in Jupyter notebook.
1 ImportNumPy as NP2 ImportPandas as PD3  fromPandasImportSeries,dataframe4 ImportMatplotlib.pyplot as Plt5%Matplotlib Inline6 #the above imported packages are self-habitual import, because it is possible to use at any time, the first time to import these all7 8 #here I wrote an Excel form, easy to read the data quickly, demo use, do not have series or dataframe write9Film = Pd.read_excel ('films.xlsx', sheet_name=1)Ten #table appears after entering film OneFil

1 #sample characteristics of a movie2train=film[['Action Lens','Kissing Lens']]   3 #The sample label, which is the label to be predicted, is here to predict what category of movies The new data belongs to4target=film['Movie Category']   5 #Create a machine learning model that needs to be imported6  fromSklearn.neighborsImportKneighborsclassifier7 #create objects, where the data is discrete, so use Kneighborsclassifier,8knn=Kneighborsclassifier ()9 #Training The KNN model, passing in sample features and sample labelsTen #Constructing function prototype, constructing loss function and finding the optimal solution of loss function One Knn.fit (train,target) AKnn

When you enter KNN, the following code appears, indicating that the training is complete

1 kneighborsclassifier (algorithm='auto', leaf_size=30, metric='  Minkowski',2            metric_params=none, N_jobs=1, n_neighbors=5, p=2, 3            weights='uniform')
1 # You can write 3 sample data here, according to the dimension of the sample data. 2 Cat=np.array ([[5,19],[21,6],[23,24]])3#  cat=np.array ([[21,4]])  can also write 1 4#  Use the Predict function to predict data 5 knn.predict (CAT)

The run will appear:

Forecast Complete! Successfully judged the attribution category of 3 new samples
Next, you can also draw a diagram to visually view your neighbor's situation

1 # Scatter draw a scatter chart, take data to use. Values, two-dimensional arrays, one-dimensional all taken out, two-dimensional take 0, indicating that is [:, 0] 2 plt.scatter (train.values[:,0],train.values[:,1])3#  Scatter can have some properties, Color below allows you to customize the colors displayed 4 plt.scatter (cat[:,0],cat[:,1],color='red')

For:

When using the KNN neighbor algorithm, be aware of the sample set, sample characteristics, sample labels

Technical Exchange can comment on the message Oh! Learn humbly, do not forget beginner's mind, together forge ahead!

Using KNN neighbor algorithm to predict data of machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.