Machine learning Path: Python comprehensive classifier random forest classification gradient elevation decision tree classification Titanic survivor

Last Update:2018-04-29 Source: Internet

Author: User

Tags rfc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python3 learn the API of using random forest classifier gradient to promote decision tree classification and compare them with the single decision tree prediction results

Attached to my git, please refer to my other classifier code: https://github.com/linyi0604/MachineLearning

1 ImportPandas as PD2  fromSklearn.cross_validationImportTrain_test_split3  fromSklearn.feature_extractionImportDictvectorizer4  fromSklearn.treeImportDecisiontreeclassifier5  fromSklearn.metricsImportClassification_report6  fromSklearn.ensembleImportRandomforestclassifier, Gradientboostingclassifier7 8 " "9 Integrated classifier:Ten Consider the prediction results of multiple classifiers in a comprehensive consideration.  One This comprehensive consideration is broadly divided into two types: A 1 build multiple independent classification models, and then vote in the form of random forest classifiers - The random forest constructs several decision trees at the same time in the training data, these decision trees will abandon the unique algorithm when constructing, randomly chooses the characteristic - 2 build multiple classification models in a certain order, the there is a dependency between them, and each subsequent model requires a comprehensive performance contribution from the existing model, - build a more powerful classifier from several weaker classifiers, such as a gradient-boosting decision tree - The decision tree of the prefect forest is set up to minimize the errors of the adult in fitting data.  -          + The following will compare the predictions of a decision tree with a single decision tree for random forest gradient enhancement -  + " " A  at " " - 1 Preparing Data - " " - #read Titanic passenger data, downloaded from the Internet to local -Titanic = Pd.read_csv ("./data/titanic/titanic.txt") - #observation data found missing phenomenon in #print (Titanic.head ()) -  to #extracting key features, sex, age, pclass are all likely to affect whether or not to be spared +x = titanic[['Pclass',' Age','Sex']] -y = titanic['survived'] the #View the currently selected feature * #print (X.info ()) $ " "Panax Notoginseng <class ' pandas.core.frame.DataFrame ' > - rangeindex:1313 entries, 0 to 1312 the Data Columns (total 3 columns): + pclass 1313 Non-null Object A Age 633 Non-null float64 the sex 1313 Non-null object + Dtypes:float64 (1), Object (2) - Memory usage:30.9+ KB $ None $ " " - #There are only 633 age data columns, and the use of an average or median for vacancies is expected to have a small impact on the model -x[' Age'].fillna (x[' Age'].mean (), inplace=True) the  - " "Wuyi 2 Data Segmentation the " " -X_train, X_test, y_train, y_test = Train_test_split (x, Y, test_size=0.25, random_state=33) Wu #feature extraction using a feature converter -VEC =Dictvectorizer () About #Type of data will be drawn out of the data type will remain unchanged $X_train = Vec.fit_transform (X_train.to_dict (orient="Record")) - #print (vec.feature_names_) # [' Age ', ' pclass=1st ', ' pclass=2nd ', ' pclass=3rd ', ' sex=female ', ' Sex=male '] -X_test = Vec.transform (X_test.to_dict (orient="Record")) -  A " " + 3.1 Single decision tree training model for forecasting the " " - #Initialize decision tree classifier $DTC =Decisiontreeclassifier () the #Training the Dtc.fit (X_train, Y_train) the #Predicting saved results theDtc_y_predict =dtc.predict (x_test) -  in " " the 3.2 Forecasting using a random forest training model the " " About #Initialize random forest classifier theRFC =Randomforestclassifier () the #Training the Rfc.fit (X_train, Y_train) + #Forecast -Rfc_y_predict =rfc.predict (x_test) the Bayi " " the 3.3 Model Training and prediction using gradient-boosted decision trees the " " - #Initializing the classifier -GBC =Gradientboostingclassifier () the #Training the Gbc.fit (X_train, Y_train) the #Forecast theGbc_y_predict =gbc.predict (x_test) -  the  the " " the 4 Model Evaluation94 " " the Print("Single decision tree accuracy:", Dtc.score (X_test, y_test)) the Print("Other indicators: \ n", Classification_report (Dtc_y_predict, Y_test, target_names=['died','survived'])) the 98 Print("Random forest accuracy:", Rfc.score (X_test, y_test)) About Print("Other indicators: \ n", Classification_report (Rfc_y_predict, Y_test, target_names=['died','survived'])) - 101 Print("gradient Boost Decision tree accuracy:", Gbc.score (X_test, y_test))102 Print("Other indicators: \ n", Classification_report (Gbc_y_predict, Y_test, target_names=['died','survived']))103 104 " " the Single decision tree accuracy: 0.7811550151975684106 Other indicators:107 Precision recall F1-score support108 109 died 0.91 0.78 0.84 236 the survived 0.58 0.80 0.67111  the avg/total 0.81 0.78 0.79 329113  the Random forest accuracy: 0.78419452887538 the Other indicators: the Precision recall F1-score support117 118 died 0.91 0.78 0.84 237119 survived 0.58 0.80 0.68 - 121 avg/total 0.82 0.78 0.79 329122 123 gradient Boost Decision tree accuracy: 0.790273556231003124 Other indicators: the Precision recall F1-score support126 127 died 0.92 0.78 0.84 239 - survived 0.58 0.82 0.68129  the avg/total 0.83 0.79 0.80 329131  the " "

Machine learning Path: Python comprehensive classifier random forest classification gradient elevation decision tree classification Titanic survivor

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More