Python3 learn the API of using random forest classifier gradient to promote decision tree classification and compare them with the single decision tree prediction results
Attached to my git, please refer to my other classifier code: https://github.com/linyi0604/MachineLearning
1 ImportPandas as PD2 fromSklearn.cross_validationImportTrain_test_split3 fromSklearn.feature_extractionImportDictvectorizer4 fromSklearn.treeImportDecisiontreeclassifier5 fromSklearn.metricsImportClassification_report6 fromSklearn.ensembleImportRandomforestclassifier, Gradientboostingclassifier7 8 " "9 Integrated classifier:Ten Consider the prediction results of multiple classifiers in a comprehensive consideration. One This comprehensive consideration is broadly divided into two types: A 1 build multiple independent classification models, and then vote in the form of random forest classifiers - The random forest constructs several decision trees at the same time in the training data, these decision trees will abandon the unique algorithm when constructing, randomly chooses the characteristic - 2 build multiple classification models in a certain order, the there is a dependency between them, and each subsequent model requires a comprehensive performance contribution from the existing model, - build a more powerful classifier from several weaker classifiers, such as a gradient-boosting decision tree - The decision tree of the prefect forest is set up to minimize the errors of the adult in fitting data. - + The following will compare the predictions of a decision tree with a single decision tree for random forest gradient enhancement - + " " A at " " - 1 Preparing Data - " " - #read Titanic passenger data, downloaded from the Internet to local -Titanic = Pd.read_csv ("./data/titanic/titanic.txt") - #observation data found missing phenomenon in #print (Titanic.head ()) - to #extracting key features, sex, age, pclass are all likely to affect whether or not to be spared +x = titanic[['Pclass',' Age','Sex']] -y = titanic['survived'] the #View the currently selected feature * #print (X.info ()) $ " "Panax Notoginseng <class ' pandas.core.frame.DataFrame ' > - rangeindex:1313 entries, 0 to 1312 the Data Columns (total 3 columns): + pclass 1313 Non-null Object A Age 633 Non-null float64 the sex 1313 Non-null object + Dtypes:float64 (1), Object (2) - Memory usage:30.9+ KB $ None $ " " - #There are only 633 age data columns, and the use of an average or median for vacancies is expected to have a small impact on the model -x[' Age'].fillna (x[' Age'].mean (), inplace=True) the - " "Wuyi 2 Data Segmentation the " " -X_train, X_test, y_train, y_test = Train_test_split (x, Y, test_size=0.25, random_state=33) Wu #feature extraction using a feature converter -VEC =Dictvectorizer () About #Type of data will be drawn out of the data type will remain unchanged $X_train = Vec.fit_transform (X_train.to_dict (orient="Record")) - #print (vec.feature_names_) # [' Age ', ' pclass=1st ', ' pclass=2nd ', ' pclass=3rd ', ' sex=female ', ' Sex=male '] -X_test = Vec.transform (X_test.to_dict (orient="Record")) - A " " + 3.1 Single decision tree training model for forecasting the " " - #Initialize decision tree classifier $DTC =Decisiontreeclassifier () the #Training the Dtc.fit (X_train, Y_train) the #Predicting saved results theDtc_y_predict =dtc.predict (x_test) - in " " the 3.2 Forecasting using a random forest training model the " " About #Initialize random forest classifier theRFC =Randomforestclassifier () the #Training the Rfc.fit (X_train, Y_train) + #Forecast -Rfc_y_predict =rfc.predict (x_test) the Bayi " " the 3.3 Model Training and prediction using gradient-boosted decision trees the " " - #Initializing the classifier -GBC =Gradientboostingclassifier () the #Training the Gbc.fit (X_train, Y_train) the #Forecast theGbc_y_predict =gbc.predict (x_test) - the the " " the 4 Model Evaluation94 " " the Print("Single decision tree accuracy:", Dtc.score (X_test, y_test)) the Print("Other indicators: \ n", Classification_report (Dtc_y_predict, Y_test, target_names=['died','survived'])) the 98 Print("Random forest accuracy:", Rfc.score (X_test, y_test)) About Print("Other indicators: \ n", Classification_report (Rfc_y_predict, Y_test, target_names=['died','survived'])) - 101 Print("gradient Boost Decision tree accuracy:", Gbc.score (X_test, y_test))102 Print("Other indicators: \ n", Classification_report (Gbc_y_predict, Y_test, target_names=['died','survived']))103 104 " " the Single decision tree accuracy: 0.7811550151975684106 Other indicators:107 Precision recall F1-score support108 109 died 0.91 0.78 0.84 236 the survived 0.58 0.80 0.67111 the avg/total 0.81 0.78 0.79 329113 the Random forest accuracy: 0.78419452887538 the Other indicators: the Precision recall F1-score support117 118 died 0.91 0.78 0.84 237119 survived 0.58 0.80 0.68 - 121 avg/total 0.82 0.78 0.79 329122 123 gradient Boost Decision tree accuracy: 0.790273556231003124 Other indicators: the Precision recall F1-score support126 127 died 0.92 0.78 0.84 239 - survived 0.58 0.82 0.68129 the avg/total 0.83 0.79 0.80 329131 the " "
Machine learning Path: Python comprehensive classifier random forest classification gradient elevation decision tree classification Titanic survivor