Python for Titanic survival predictions-data exploration and analysis!

Source: Internet
Author: User

1 Data preview

1.1 Head ()

Preview the data set in front of the data to see what the value of each field looks like.

1.2 info ()

You can see how many non-null values each field has and what type of field it is.

1.3 Describe ()

You can roughly describe the numerical distribution of each integer or floating-point type, looking at the minimum, maximum, and four-bit numbers to get an overview of the data offset.

Age field is also missing, in general, sick is to be given special care, so age should be a more important feature, and because it is a continuous value, the algorithm is used to predict the way to fill.

Finally, let's take a look at the populated data situation

3. Data exploration

3.1 Distribution of individual field values

Look at the code first:

These are canvas-related settings

Subplots_adjust () is used to adjust the interval size of the canvas.

The above is the code to draw each sub-graph in the corresponding position on the canvas. The graphs are as follows:

3.2 Explore the relationships between fields and survival, and find useful features for the model

3.2.1 The relationship between different passenger levels and survival

The more advanced the class, the greater the proportion of survival. The proportion of those who were not rescued in class 3 was significantly increased. Indicates whether the class is related to the existence of the accommodation.

The relationship between 3.2.2 Sex and survival

It can be found that most are concentrated in the 20-50-year-old, from the box-line chart to see the average age of nearly 30 years.

Because age is a continuous value, we consider the relationship between age and survival by staging a statistical display of age.

The odds of getting older from the data are bigger. There was a significant difference in survival rates between different age groups, indicating that age was related to survival.

3.2.4 the relationship between brothers and sisters and whether they are alive or not

From the data, siblings have the highest survival rate in 1-2.

3.2.5 whether there is a relationship between parents ' children and survival

The data show that the number of parents and children in 1-3 survival rate is the highest, the more the number is decreased survival rate.

The relationship between 3.2.6 port and survival

Data show that the survival rate of the port is significantly higher. It may be that there are some ports in the middle of the boat and some of the passengers disembark.

This article references: Mr. Big Tree's Blog

You are welcome to follow my blog: https://home.cnblogs.com/u/Python1234/

Welcome to join thousands of people to exchange learning questions and Answers Group: 125240963

Python for Titanic survival predictions-data exploration and analysis!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.