Titanic Survival Prediction (Python)

Source: Internet
Author: User

1 Data exploration

A holistic understanding of the data

1.1 Viewing data What are some of the characteristics

Import Pandas as PD Import Seaborn as SNS%= pd.read_csv ('g:\\titanic\\train.csv')
Titanic.sample (10)

Get the data 10 rows of records to observe, preliminary understanding of the composition of the data, you can see that the age, cabin inside is missing values, after further understanding the statistics of the data and then data processing, observe the maximum minimum value of each feature, can be found that these data is reasonable, there is no special outliers.

Print (Titanic.describe ())
#查看常用的统计量

2 Data analysis \ Processing

Name and ticket based on basic cognition, it is not related to whether the passenger has the chance to survive, so it ignores these two characteristics for the time being. Because cabin this one characteristic missing value is more, the reference value is low, therefore also temporarily shelved.

2.1 Sex feature Processing

Sex is divided into female and male, but some algorithmic models only recognize numbers, so they are represented by 0 and 1 respectively.

Titanic. Sex = Titanic. Sex.replace ("male", 1= Titanic. Sex.replace ("female", 0)

2.2 Age feature processing

Age there are missing values, there are 714 rows in the old record, where the average of the ages is used to fill missing values

Titanic. Age = titanic['age'].fillna (Titanic. Age.mean ())

2.3 embarked feature processing

Replace the embarked s C q with 0 1 2 respectively

Titanic. embarked = Titanic. Embarked.replace ("S", 0= Titanic. Embarked.replace ("C", 1= Titanic. Embarked.replace ("Q", 2)

View embarked feature statistics found that he has missing values, where they replace missing values with the majority

Titanic. embarked = titanic["embarked"].fillna (0)

3 Feature Engineering

The correlation between the characteristics and the survived is observed by the heat-seeking force

info = ["survived","Passengerid","Pclass","Sex"," Age","sibsp","Parch","Fare","embarked"]sns.heatmap (Titanic[info].corr (), Annot=true,vmin = 0, Vmax = 1)

The correlation between Pclass, Sex, Fare, embarked and survived is relatively strong according to the Heat diagram, so the characteristics are studied in the machine learning model.

4 Model Learning/evaluation

Import NumPy as NP  from Import Linear_model  from Import Cross_val_score
x = titanic[["Pclass", "Sex", "Age", "Fare", "embarked"]
y = titanic["survived"]

The method of cross-examination is used to evaluate the model with the average value.

4.1 Logistic regression

LM ="accuracy")print(Np.mean (scores))

4.2 k Nearest Neighbor

 from Import  "uniform " "accuracy")  Print(Np.mean (Score)

4.3 Decision Tree

 from Import  ="accuracy")print(Np.mean (scores))

4.4 Random Forest

 from Import  = Ensemble. randomforestclassifier ("accuracy")print(Np.mean (scores ))

4.5 GBDT

GBDT ="accuracy")print(Np.mean (scores))

5 Summary

Through the data exploration, data processing, common machine learning model comparison, finally can be found GBDT in the Titanic survival prediction above the best performance, accuracy can reach more than 82%.  

Titanic Survival Prediction (Python)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.