Preface--Remember when Ali internship, we are using mllib under the GBDT to train model. However, since mllib is not open source, it is not available outside the company. Later to participate in the Kaggle competition, recognized a GDBT useful tools, xgboost, so seriously study a bit.
GitHub Address: Https://github.com/dmlc/xgboost
The specific use of the way, in fact, there are instructions, the following main talk about under the Windows Python environment how to use Xgboost.
1. Follow the official instructions to download and compile the release version of the 64-bit program, and execute "python setup.py install".
2. Add the Environment path "sys.path.append (' C:\\.........\\xgboost-master\\wrapper ')" in Python Code (note: This path is the path you downloaded the extracted folder wrapper)
3, so you can try "import xgboost as XGB", if successful there is no problem. (Note: To rely on NumPy and scipy, and install the number of bits to match your Python, 64-bit version numpy and scipy to download here, 32-bit version can be downloaded to the official website)
The following is a train code and predict code:
Train
#!/usr/bin/python
Import Sys,os
sys.path.append (' c:\\xgboost-master\\wrapper ')
import numpy as NP
Import scipy.sparse
Import xgboost as XGB #
# # Simple Example
# load file from text file, also binary buffer ge nerated by xgboost
dtrain = xgb. Dmatrix (' c:\\predictahe_trainset_libsvmformat.txt ')
dtest = XGB. Dmatrix (' C:\\predictahe_testset_libsvmformat.txt ')
# Specify parameters via map, definition is same as C + + version< C10/>param = {' max_depth ': 6, ' ETA ': 0.3, ' silent ': 1, ' objective ': ' Binary:logistic '}
# Specify validations set to Wat CH Performance
Watchlist = [(dtest, ' eval '), (Dtrain, ' train ')]
num_round =
BST = Xgb.train ( Param, Dtrain, num_round, watchlist)
# This is prediction preds
= bst.predict (dtest)
labels = dtest.get_ Label ()
print (' error=%f '% ( sum (1 for I in range (len (preds)) if int (preds[i]>0.5)!=labels[i])/float (Len ( Preds)))
Bst.save_model (' C:\\xgb.model '))
Predict
#!/usr/bin/python
Import Sys,os
sys.path.append (' c:\\xgboost-master\\wrapper ')
import numpy as NP
Import scipy.sparse
Import xgboost as XGB #
# # Simple Example
# load file from text file, also binary buffer ge nerated by xgboost
dtest2 = xgb. Dmatrix (' C:\\predictahe_temp_libsvmformat.txt ')
# Load model and data in
Bst2 = XGB. Booster (model_file= ' C:\\xgb.model ')
preds2 = bst2.predict (dtest2)
# This is prediction
outing = open (' C:\\predictahe_temp_result.txt ', ' W ')
outing.write (str (int (preds2[0]>0.5))) #只输出了一个
outing.close ()
P.S. input file format if LIBSVM format