Xgboost series ubuntu14.04 installation {code...} error solution: {code...} success! Overfitting when you observe that the training accuracy is high, but the detection accuracy is low, it is very likely that you encounter an over-fitting problem. Xgboost is a high-speed boosting model... xgboost series
Ubuntu14.04 installation
pip install xgboost
Error
sudo apt-get update
Errors with the same results
Solution:
sudo -H pip install --pre xgboostSuccessfully installed xgboostCleaning up...
Successful!
Overfitting
When you observe that the training accuracy is high, but the detection accuracy is low, you may encounter an over-fitting problem.
Xgboost is a high-speed boosting model.
Boosting classifier is an integrated learning model. The basic idea is to combine hundreds of tree models with low classification accuracy to form a model with high accuracy. This model is constantly iterated to generate a new tree each iteration. There are many methods for generating reasonable trees in each step. here we will briefly introduce the Gradient Boosting Machine proposed by Friedman. It uses the gradient descent idea when generating each tree. based on all the previously generated trees, it takes one more step toward minimizing the given target function. With reasonable parameter settings, we often need to generate a certain number of trees to achieve satisfactory accuracy. When the dataset is large and complex, we may need thousands of iterative operations. if it takes several seconds to generate a tree model, it takes so much time to iterate, you should be able to concentrate on it ......
Now, we hope to solve this problem better through xgboost. The full name of xgboost is eXtreme Gradient Boosting. As its name suggests, it is a c ++ implementation of Gradient Boosting Machine, and the author is Chen Tianchi, a norm who is studying Machine learning at the University of Washington. He felt deeply constrained in his studiesCalculation speed and accuracy of existing databasesSo we started building the xgboost project a year ago and gradually formed it last summer. The biggest feature of xgboost is that itIt can automatically use multiple threads of the CPU for parallel processing, and improve the accuracy in the algorithm.. Its debut was the Higgs' signal recognition competition in Kaggle, because of its outstanding efficiency and high prediction accuracy, which attracted the attention of contestants in the competition Forum, it has a place in the fierce competition of more than 1700 teams. With its increasing popularity in the Kaggle community, some teams have recently won the first place in the competition with xgboost.
For ease of use, Chen Tianqi encapsulates xgboost into a python library. I have cooperated with him to create the R language interface of xgboost and submit it to CRAN. Some users encapsulate it into the julia Library. The functions of the python and R interfaces are constantly updated. you can learn about the functions in the following sections and select the language you are most familiar.
Use Ipython notebook
Enter command lines directly
ipython notebook
The above is the installation method of some columns in Python. For more information, see other related articles in the first PHP community!