Machine learning Path: Python comprehensive classifier random forest classification gradient elevation decision tree classification Titanic survivor

Source: Internet
Author: User
Tags rfc

Python3 learn the API of using random forest classifier gradient to promote decision tree classification and compare them with the single decision tree prediction results

Attached to my git, please refer to my other classifier code: https://github.com/linyi0604/MachineLearning

1 ImportPandas as PD2  fromSklearn.cross_validationImportTrain_test_split3  fromSklearn.feature_extractionImportDictvectorizer4  fromSklearn.treeImportDecisiontreeclassifier5  fromSklearn.metricsImportClassification_report6  fromSklearn.ensembleImportRandomforestclassifier, Gradientboostingclassifier7 8 " "9 Integrated classifier:Ten Consider the prediction results of multiple classifiers in a comprehensive consideration.  One This comprehensive consideration is broadly divided into two types: A 1 build multiple independent classification models, and then vote in the form of random forest classifiers - The random forest constructs several decision trees at the same time in the training data, these decision trees will abandon the unique algorithm when constructing, randomly chooses the characteristic - 2 build multiple classification models in a certain order, the there is a dependency between them, and each subsequent model requires a comprehensive performance contribution from the existing model, - build a more powerful classifier from several weaker classifiers, such as a gradient-boosting decision tree - The decision tree of the prefect forest is set up to minimize the errors of the adult in fitting data.  -          + The following will compare the predictions of a decision tree with a single decision tree for random forest gradient enhancement -  + " " A  at " " - 1 Preparing Data - " " - #read Titanic passenger data, downloaded from the Internet to local -Titanic = Pd.read_csv ("./data/titanic/titanic.txt") - #observation data found missing phenomenon in #print (Titanic.head ()) -  to #extracting key features, sex, age, pclass are all likely to affect whether or not to be spared +x = titanic[['Pclass',' Age','Sex']] -y = titanic['survived'] the #View the currently selected feature * #print (X.info ()) $ " "Panax Notoginseng <class ' pandas.core.frame.DataFrame ' > - rangeindex:1313 entries, 0 to 1312 the Data Columns (total 3 columns): + pclass 1313 Non-null Object A Age 633 Non-null float64 the sex 1313 Non-null object + Dtypes:float64 (1), Object (2) - Memory usage:30.9+ KB $ None $ " " - #There are only 633 age data columns, and the use of an average or median for vacancies is expected to have a small impact on the model -x[' Age'].fillna (x[' Age'].mean (), inplace=True) the  - " "Wuyi 2 Data Segmentation the " " -X_train, X_test, y_train, y_test = Train_test_split (x, Y, test_size=0.25, random_state=33) Wu #feature extraction using a feature converter -VEC =Dictvectorizer () About #Type of data will be drawn out of the data type will remain unchanged $X_train = Vec.fit_transform (X_train.to_dict (orient="Record")) - #print (vec.feature_names_) # [' Age ', ' pclass=1st ', ' pclass=2nd ', ' pclass=3rd ', ' sex=female ', ' Sex=male '] -X_test = Vec.transform (X_test.to_dict (orient="Record")) -  A " " + 3.1 Single decision tree training model for forecasting the " " - #Initialize decision tree classifier $DTC =Decisiontreeclassifier () the #Training the Dtc.fit (X_train, Y_train) the #Predicting saved results theDtc_y_predict =dtc.predict (x_test) -  in " " the 3.2 Forecasting using a random forest training model the " " About #Initialize random forest classifier theRFC =Randomforestclassifier () the #Training the Rfc.fit (X_train, Y_train) + #Forecast -Rfc_y_predict =rfc.predict (x_test) the Bayi " " the 3.3 Model Training and prediction using gradient-boosted decision trees the " " - #Initializing the classifier -GBC =Gradientboostingclassifier () the #Training the Gbc.fit (X_train, Y_train) the #Forecast theGbc_y_predict =gbc.predict (x_test) -  the  the " " the 4 Model Evaluation94 " " the Print("Single decision tree accuracy:", Dtc.score (X_test, y_test)) the Print("Other indicators: \ n", Classification_report (Dtc_y_predict, Y_test, target_names=['died','survived'])) the 98 Print("Random forest accuracy:", Rfc.score (X_test, y_test)) About Print("Other indicators: \ n", Classification_report (Rfc_y_predict, Y_test, target_names=['died','survived'])) - 101 Print("gradient Boost Decision tree accuracy:", Gbc.score (X_test, y_test))102 Print("Other indicators: \ n", Classification_report (Gbc_y_predict, Y_test, target_names=['died','survived']))103 104 " " the Single decision tree accuracy: 0.7811550151975684106 Other indicators:107 Precision recall F1-score support108 109 died 0.91 0.78 0.84 236 the survived 0.58 0.80 0.67111  the avg/total 0.81 0.78 0.79 329113  the Random forest accuracy: 0.78419452887538 the Other indicators: the Precision recall F1-score support117 118 died 0.91 0.78 0.84 237119 survived 0.58 0.80 0.68 - 121 avg/total 0.82 0.78 0.79 329122 123 gradient Boost Decision tree accuracy: 0.790273556231003124 Other indicators: the Precision recall F1-score support126 127 died 0.92 0.78 0.84 239 - survived 0.58 0.82 0.68129  the avg/total 0.83 0.79 0.80 329131  the " "

Machine learning Path: Python comprehensive classifier random forest classification gradient elevation decision tree classification Titanic survivor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.