Comparing randomized search and grid search for Hyperparameter estimation
Compare randomized search and grid search for optimizing hyperparameters of a random forest. All parameters that influence the learning is searched simultaneously (except for the number of estimators, which poses a Time/quality tradeoff).
The randomized search and the grid search explore exactly the same space of parameters. The result in parameter settings was quite similar, while the run time for randomized search is drastically lower.
The performance is slightly worse for the randomized search, though this is most likely a noise effect and would not carry Over to a held-out test set.
Note that in practice, one would not search over this many different parameters simultaneously using grid search, but pick Only the ones deemed most important.
Python Source code: randomized_search.py
Print(__doc__)ImportNumPyAsNpFromTimeImportTimeFromoperatorImportItemgetterFromScipy.statsImportRandintAsSp_randintFromSklearn.grid_searchImportGRIDSEARCHCV,RANDOMIZEDSEARCHCVFromSklearn.datasetsImportload_digitsFromSklearn.ensembleImportRandomforestclassifier# Get some dataIris=load_digits()X,Y=Iris.Data,Iris.Target# Build a classifierClF=Randomforestclassifier(N_estimators=20)# Utility function to the report best scoresDefReport(Grid_scores,N_top=3):Top_scores=Sorted(Grid_scores,Key=Itemgetter(1),Reverse=True)[:N_top]ForI,ScoreInchEnumerate(Top_scores):Print("Model with rank: {0}".Format(I+1))Print("Mean validation score: {0:.3f} (std: {1:.3f})".Format(Score.Mean_validation_score,Np.Std(Score.Cv_validation_scores)))Print("Parameters: {0}".Format(Score.Parameters))Print("")# Specify parameters and distributions to sample fromParam_dist={"Max_depth":[3,None],"Max_features":Sp_randint(1,11),"Min_samples_split":Sp_randint(1,11),"Min_samples_leaf":Sp_randint(1,11),"Bootstrap":[True,False],"Criterion":["Gini","Entropy"]}# Run Randomized searchN_iter_search=20Random_search=RANDOMIZEDSEARCHCV(ClF,Param_distributions=Param_dist,N_iter=N_iter_search)Start=Time()Random_search.Fit(X,Y)Print("RANDOMIZEDSEARCHCV took%.2fSeconds for%dCandidates ""Parameter settings."%((Time()-Start),N_iter_search))Report(Random_search.Grid_scores_)# Use a full grid over all parametersParam_grid={"Max_depth":[3,None],"Max_features":[1,3,10],"Min_samples_split":[1,3,10],"Min_samples_leaf":[1,3,10],"Bootstrap":[True,False],"Criterion":["Gini","Entropy"]}# Run Grid SearchGrid_search=GRIDSEARCHCV(ClF,Param_grid=Param_grid)Start=Time () grid_search. Fit (xy) print< Span class= "P" > ( "GRIDSEARCHCV took %.2f seconds for %d candidate parameter settings. " % (time () -start len (grid_search. Grid_scores_report (grid_search grid_scores_)
Comparing randomized search and grid search for Hyperparameter estimation