1 Data preview
1.1 Head ()
Preview the data set in front of the data to see what the value of each field looks like.
1.2 info ()
You can see how many non-null values each field has and what type of field it is.
1.3 Describe ()
You can roughly describe the numerical distribution of each integer or floating-point type, looking at the minimum, maximum, and four-bit numbers to get an overview of the data offset.
Age field is also missing, in general, sick is to be given special care, so age should be a more important feature, and because it is a continuous value, the algorithm is used to predict the way to fill.
Finally, let's take a look at the populated data situation
3. Data exploration
3.1 Distribution of individual field values
Look at the code first:
These are canvas-related settings
Subplots_adjust () is used to adjust the interval size of the canvas.
The above is the code to draw each sub-graph in the corresponding position on the canvas. The graphs are as follows:
3.2 Explore the relationships between fields and survival, and find useful features for the model
3.2.1 The relationship between different passenger levels and survival
The more advanced the class, the greater the proportion of survival. The proportion of those who were not rescued in class 3 was significantly increased. Indicates whether the class is related to the existence of the accommodation.
The relationship between 3.2.2 Sex and survival
It can be found that most are concentrated in the 20-50-year-old, from the box-line chart to see the average age of nearly 30 years.
Because age is a continuous value, we consider the relationship between age and survival by staging a statistical display of age.
The odds of getting older from the data are bigger. There was a significant difference in survival rates between different age groups, indicating that age was related to survival.
3.2.4 the relationship between brothers and sisters and whether they are alive or not
From the data, siblings have the highest survival rate in 1-2.
3.2.5 whether there is a relationship between parents ' children and survival
The data show that the number of parents and children in 1-3 survival rate is the highest, the more the number is decreased survival rate.
The relationship between 3.2.6 port and survival
Data show that the survival rate of the port is significantly higher. It may be that there are some ports in the middle of the boat and some of the passengers disembark.
This article references: Mr. Big Tree's Blog
You are welcome to follow my blog: https://home.cnblogs.com/u/Python1234/
Welcome to join thousands of people to exchange learning questions and Answers Group: 125240963
Python for Titanic survival predictions-data exploration and analysis!