標籤:scikit-learn 機器學習 xgboost machine learning in
接上篇:http://blog.csdn.net/mmc2015/article/details/47304591
def xgboost_pred(train,labels,test):params = {}params["objective"] = "reg:linear"params["eta"] = 0.005params["min_child_weight"] = 6params["subsample"] = 0.7params["colsample_bytree"] = 0.7params["scale_pos_weight"] = 1params["silent"] = 1params["max_depth"] = 9 plst = list(params.items())#Using 5000 rows for early stopping. offset = 4000num_rounds = 10000xgtest = xgb.DMatrix(test)#create a train and validation dmatrices xgtrain = xgb.DMatrix(train[offset:,:], label=labels[offset:])xgval = xgb.DMatrix(train[:offset,:], label=labels[:offset])#train using early stopping and predictwatchlist = [(xgtrain, 'train'),(xgval, 'val')]model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=120)preds1 = model.predict(xgtest,ntree_limit=model.best_iteration)#reverse train and labels and use different 5k for early stopping. # this adds very little to the score but it is an option if you are concerned about using all the data. train = train[::-1,:]labels = np.log(labels[::-1])xgtrain = xgb.DMatrix(train[offset:,:], label=labels[offset:])xgval = xgb.DMatrix(train[:offset,:], label=labels[:offset])watchlist = [(xgtrain, 'train'),(xgval, 'val')]model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=120)preds2 = model.predict(xgtest,ntree_limit=model.best_iteration)#combine predictions#since the metric only cares about relative rank we don't need to averagepreds = (preds1)*1.4 + (preds2)*8.6return preds
(code from kaggle)
代碼具體分析有時間寫,歡迎吐槽。。。。
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
machine learning in coding(python):使用xgboost構建預測模型