The purpose of this paper is to record the use of the GBRT in the Sklearn package, which is mainly the meaning of the official website parameters; for the theoretical part and the actual use hope in just give the source, hope to have time to add the complete
Summary:
1. example
2. Model main Parameters
3. Model main attribute variables
Content:
1. example
>>> Import NumPy as NP
>>> from sklearn.metrics import mean_squared_error
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.ensemble import gradientboostingregressor
>>> X, y = make_friedman1 (n_samples=1200, random_state=0, noise=1.0)
>>> x_train, x_test = x[:200], x[200:]
>>> y_train, y_test = y[:200], y[200:]
>>> est = gradientboostingregressor (n_estimators=100, learning_rate=0.1,
... max_depth=1, random_state=0, loss= ' ls '). fit (x_train, y_train)
>>> mean_squared_error (y_test, est.predict (x_test))
5.00 ...
2. Model main Parameters
2.1 N_estimators:int (default=100)
The number of iterations of the gradient boost is also the number of weak classifiers
2.2 loss: {' ls ', ' lad ', ' Huber ', ' quantile '}, optional (default= ' ls ')
Loss function
2.3 learning_rate:float, Optional (default=0.1)
The step length of SGB (random gradient Ascension) is also called learning speed, and the lower the learning_rate, the greater the N_estimators.
Experience shows that the smaller the learning_rate, the smaller the test error; see http://scikit-learn.org/stable/modules/ensemble.html#Regularization for specific values
2.4 max_depth:integer, Optional (default=3)
Maximum depth of decision stump (decision Stump), pre-pruning Operation (tree Depth here does not include Root)
2.5 warm_start:bool, Default:false
If true, the previous fit results are stored to increase the number of iterations
3. Model main attribute variables
3.1 train_score_: array, shape = [n_estimators]
Store training errors for each iteration
3.2 feature_importances_: array, shape = [n_features]
Characteristic importance, specific reference: http://scikit-learn.org/stable/modules/ensemble.html#random-forest-feature-importance
The Scikit-learn gradient lift algorithm (Gradient Boosting) uses