1. Introduction
When we run the machine learning program, especially when adjusting the network parameters, there are usually many parameters to be adjusted, the combination of parameters is more complicated. In accordance with the principle of attention > Time > Money, manual adjustment of attention costs by manpower is too high and is not worth it. The For loop or for loop-like approach is constrained by too-distinct levels, concise and flexible, with high attention costs and error-prone. This paper introduces the GRIDSEARCHCV module of the Sklearn module, which can automatically search for different model combinations with different parameters within the specified range to effectively liberate the attention.
2. Introduction to the GRIDSEARCHCV module
This module is a sub-module of the Sklearn module, and the import method is very simple
from import GRIDSEARCHCV
Function Prototypes:
classsklearn.model_selection. GRIDSEARCHCV (Estimator, Param_grid, Scoring=none, Fit_params=none, N_jobs=1, Iid=true, Refit=true, Cv=None, Verbose=0, Pre_dispatch='2*n_jobs', error_score='Raise', Return_train_score=true)
Where the CV can be an integer or a cross-validation generator or an iterator, the 4 types of input for CV parameters are listed below:
- None: Default parameter, function uses default 30 percent cross-validation
- Integer k:k cross-validation. For categorical tasks, use Stratifiedkfold (category balancing, which has as many training sets per class as you can see in the official documentation). For other tasks, use Kfold
- Cross-validation generator: Get your own sketch builder
- Iterators that can generate training sets and test sets
3. Automatic saving of analysis resultscomma-separated values (comma-separated values,CSV, and sometimes also referred to as character-delimited value , because delimited characters can also be not commas), whose files store tabular data (numbers and text) in plain text. Plain text means that the file is one and does not contain data that must be interpreted like a binary number. A CSV file consists of any number of records separated by a newline character, each record consists of a field, and the delimiter between the fields is another character or string, most commonly a comma or tab. Typically, all records have exactly the same sequence of fields.
CSV file has a prominent advantage, can be opened with Excel and other software, compared to Notepad and Matlab, Python and other programming language interface, easy to view, production reports, post-finishing and so on.
In the GRIDSEARCHCV module, the combination of different hyper-parameters and their computational results are saved in a dictionary in Clf.cv_results_, and the Python Pandas module provides a way to efficiently organize the data, requiring only 3 lines of code to solve the problem.
In the GRIDSEARCHCV module, the combination of different hyper-parameters and their computational results are saved in a dictionary in Clf.cv_results_, and the Python Pandas module provides a way to efficiently organize the data, requiring only 3 lines of code to solve the problem.
Cv_result =
With Open ('Cv_result.csv','W') as F:
Cv_result.to_csv (f)
4. Complete routinesthe code is clear and understandable without explanation. Https://github.com/JiJingYu/tensorflow-exercise/tree/master/svm_grid_search
ImportPandas as PD fromSklearnImportSVM, Datasets fromsklearn.model_selectionImportGRIDSEARCHCV fromSklearn.metricsImportClassification_report
Iris =Datasets.load_iris ()parameters = {'Kernel':('Linear', 'RBF'), 'C': [1, 2, 4],'Gamma': [0.125, 0.25, 0.5, 1, 2, 4]}SVR =SVM. SVC ()CLF = GRIDSEARCHCV (SVR, parameters, N_jobs=-1)Clf.fit (Iris.data, Iris.target)Cv_result =PD. Dataframe.from_dict (CLF.CV_RESULTS_)With Open ('Cv_result.csv','W') as F: cv_result.to_csv (f) Print('The parameters of the best model is:')Print(CLF.BEST_PARAMS_)y_pred =clf.predict (iris.data)Print(Classification_report (Y_true=iris.target, y_pred=y_pred))
Python Hyper-parameter auto-search module GRIDSEARCHCV (favorites)