Using Python3 to learn the API of Decision tree classifier
Related to feature extraction, data type retention, classification type extraction of new types
Need to download data sets online, I downloaded them to the local,
can download code and datasets to my git: https://github.com/linyi0604/MachineLearning
1 ImportPandas as PD2 fromSklearn.cross_validationImportTrain_test_split3 fromSklearn.feature_extractionImportDictvectorizer4 fromSklearn.treeImportDecisiontreeclassifier5 fromSklearn.metricsImportClassification_report6 7 " "8 Decision Tree9 multiple features, no apparent linear relationshipTen inference logic is very intuitive One no need to standardize the data A " " - - " " the 1 Preparing Data - " " - #read Titanic passenger data, downloaded from the Internet to local -Titanic = Pd.read_csv ("./data/titanic/titanic.txt") + #observation data found missing phenomenon - #print (Titanic.head ()) + A #extracting key features, sex, age, pclass are all likely to affect whether or not to be spared atx = titanic[['Pclass',' Age','Sex']] -y = titanic['survived'] - #View the currently selected feature - #print (X.info ()) - " " - <class ' pandas.core.frame.DataFrame ' > in rangeindex:1313 entries, 0 to 1312 - Data Columns (total 3 columns): to pclass 1313 Non-null Object + Age 633 Non-null float64 - sex 1313 Non-null object the Dtypes:float64 (1), Object (2) * Memory usage:30.9+ KB $ NonePanax Notoginseng " " - #There are only 633 age data columns, and the use of an average or median for vacancies is expected to have a small impact on the model thex[' Age'].fillna (x[' Age'].mean (), inplace=True) + A " " the 2 Data Segmentation + " " -X_train, X_test, y_train, y_test = Train_test_split (x, Y, test_size=0.25, random_state=33) $ #feature extraction using a feature converter $VEC =Dictvectorizer () - #Type of data will be drawn out of the data type will remain unchanged -X_train = Vec.fit_transform (X_train.to_dict (orient="Record")) the #print (vec.feature_names_) # [' Age ', ' pclass=1st ', ' pclass=2nd ', ' pclass=3rd ', ' sex=female ', ' Sex=male '] -X_test = Vec.transform (X_test.to_dict (orient="Record"))Wuyi the " " - 3 Training model for forecasting Wu " " - #Initialize decision tree classifier AboutDTC =Decisiontreeclassifier () $ #Training - Dtc.fit (X_train, Y_train) - #Predicting saved results -Y_predict =dtc.predict (x_test) A + " " the 4 Model Evaluation - " " $ Print("accuracy:", Dtc.score (X_test, y_test)) the Print("Other indicators: \ n", Classification_report (Y_predict, Y_test, target_names=['died','survived'])) the " " the accuracy: 0.7811550151975684 the Other indicators: - Precision recall F1-score support in the died 0.91 0.78 0.84 236 the survived 0.58 0.80 0.67 About the avg/total 0.81 0.78 0.79 329 the " "
Machine learning Path: The Python decision tree classification predicts whether the Titanic passengers survived