Recently has the plan through the practice Classics Kaggle case to exercise own actual combat ability, today has recorded oneself to do titanic the whole process of the practice.
Background information:
The Python code is as follows:
#-*-Coding:utf-8-*-"" "Created on Fri Mar 12:00:46 2017 @author: Zch" "" Import pandas as PD from Sklearn.featur
E_extraction Import Dictvectorizer from sklearn.ensemble import randomforestclassifier from xgboost import xgbclassifier From sklearn.cross_validation import Cross_val_score #读取训练数据集和测试数据集 train = pd.read_csv (' e://python/data/titanic/ Train.csv ') test = pd.read_csv (' e://python/data/titanic/test.csv ') selected_features = [' Pclass ', ' Sex ', ' age ', ' Embarked ', ' sibsp ', ' parch ', ' Fare '] x_train = train[selected_features] x_test = test[selected_features] Y_train = train[' Survived '] #填充Embarked缺失值 x_train[' embarked '].fillna (' s ', inplace=true) x_test[' embarked '].fillna (' s ', inplace=true ) #填充Age缺失值 x_train[' age '].fillna (x_train[' age '].mean (), inplace=true) x_test[' age '].fillna (x_test[' age '].mean (), inplace=true) x_test[' Fare '].fillna (x_test[' Fare ')].mean (), inplace=true) #采用DictVectorizer对特征向量化 Dict_vec = Dictvectorizer (sparse=false) X_train = Dict_vec.fit_transform (x_train.to_dict(orient= ' record ')) Print (dict_vec.feature_names_) x_test = Dict_vec.transform (x_test.to_dict (orient= ' record ')) RFC = Randomforestclassifier () #使用默认配置初始化XGBClassifier XGBC = Xgbclassifier () #使用5折交叉验证的方法在训练集上分别对rfc和xgbc进行性能评估, #
Get the score for the average classification accuracy.
Cross_val_score (rfc,x_train,y_train,cv=5). Mean () Cross_val_score (xgbc,x_train,y_train,cv=5). Mean () #使用rfc进行预测操作 Rfc.fit (x_train,y_train) rfc_y_predict = rfc.predict (x_test) rfc_submission = PD. Dataframe ({' Passengerid ': test[' Passengerid '], ' survived ': rfc_y_predict}) #将预测结果存储在文件rfc_submission. csv rfc_
Submission.to_csv (' E:\\python\\data\\titanic\\rfc_sub.csv ', Index=false) #使用xgbc进行预测操作 Xgbc.fit (X_train,y_train) Xgbc_y_predict = Xgbc.predict (x_test) xgbc_submission = PD. Dataframe ({' Passengerid ': test[' Passengerid '], ' survived ': xgbc_y_predict}) #将预测结果存储在文件xgbc_submission. csv xgbc_
Submission.to_csv (' E:\\python\\data\\titanic\\xgbc_sub.csv ', index=false)