Vi. more hyper-parameters in grid search and K-nearest algorithm
Vii. Normalization of data Feature Scaling
Solution: Map all data to the same scale
Viii. the Scaler in Scikit-learn
preprocessing.py
ImportNumPy as NPclassStandardscaler:def __init__(self): Self.mean_=None Self.scale_=NonedefFit (self, X):"""get the mean and variance of the data based on the training data set X""" assertX.ndim = = 2,"The dimension of X must be 2"Self.mean_= Np.array ([Np.mean (X[:,i]) forIinchRange (x.shape[1])) Self.scale_= Np.array ([NP.STD (X[:,i]) forIinchRange (x.shape[1])]) return Selfdeftransform (self, X):"""normalized the mean variance of x according to this standardscaler""" assertX.ndim = = 2,"The dimension of X must be 2" assertSelf.mean_ is notNone andSelf.scale_ is notNone,"must fit before transform!" assertX.SHAPE[1] = =Len (self.mean_),"The feature number of X must is equal to Mean_ and Std_"ResX= Np.empty (Shape=x.shape, dtype=float) forColinchRange (x.shape[1]): Resx[:,col]= (X[:,col]-self.mean_[col])/Self.scale_[col]returnResX
Nine, more about the K nearest neighbor algorithm thinking
Advantages:
Solve the problem of classification can solve the problem of multi-classification, the idea is simple, the effect is strong?
Cons:? High data-related predictions do not have an explanatory dimension disaster
Machine Learning (iv) machine learning (four) classification algorithm--k nearest neighbor algorithm KNN (lower)