Concept
1, supervised learning: from the given label training data to learn a function, according to this function for new data labeling.
2. Unsupervised Learning: Learn a function from a given non-annotated training data, labeling all data according to this function.
KNN classification algorithm: by analyzing the training data set of known classes, the classification rules are found, and the classification algorithm is the type of supervised learning.
KNN concept:
1. Training set: Data used to train the model or determine the parameters of the model.
2. Test set: Data used to verify the accuracy of the model.
3, cross-validation: The general use of 70% of the data as a training set, the remaining 30% of the data as a test set, test sets of testing results using a cross-table Form to verify.
Sampling method
Sample (X,size,replace=false)
X-Sample Samples
Number of size-samples
replace-whether the sample can be put back, default to False
KNN method -Need to install package class
Installation method: Install.packages ("class")
KNN (train,test,cl,k=1)
train-Training Data
test-test Data
Correct results of cl-training data
K value in K-knn, default is 1, can be set by itself until cross check to the best result
Example:
#install. Packages ("Class"), library (Class) #https://en.wikipedia.org/wiki/iris_flower_data_set#https:// zh.wikipedia.org/wiki/%e5%ae%89%e5%be%b7%e6%a3%ae%e9%b8%a2%e5%b0%be%e8%8a%b1%e5%8d%89%e6%95%b0%e6%8d%ae%e9%9b% 86# calculates the number of lines of iris, and ISIS is an example of Class pack total <-nrow (IRIS); Number of lines #抽样获取70% index <-sample (1:total, total*0.7) # Get training set According to sampling number Iris.train <-Iris[index,] #根据抽样编号获取测试集,-index is the equivalent of 30%iris.test <-Iris[-index,]# Using the KNN method to pair the function based on the training set and test set #subset () is equivalent to deleting the data result of the species column (Result column) in the training set and test set. KNN <-KNN (Train=subset (Iris.train, select=-species), Test=subset (iris.test,select=-species), cl=iris.train$ Species, k=2) #进行交叉检验table (iris.test$species, result. KNN)
R language learning-KNN near algorithm