Notes:
Import Pandas as PD
For CSV data files, open with Pd.read_csv (), such as Train_data=pd.read_csv (")
Use Train_data.head () to view part of the data
Train_describe () can get statistics number, get average, variance and other characteristics (of course, for the numeric type of data)
For non-numeric types of data (character data), you can use train_data[' here to fill in the statistics of the label '].value_counts () statistical classification number
The results shown below correspond to the following: A label is Property_area, the label has semiurban urban rural, etc. three categories, statistics corresponding number
Import Matplot.pyplot as Plt
train_data[' label '].hist (BINS=50)
Plt.show ()
You can display the data distribution under the label, 50 for the y-axis interval, for the histogram display, the horizontal axis for the range of values, and the y-axis for quantity
Train_data.boxplot (column= ' label ')
Plt.show ()
You can display the numerical distribution under the label to see if the distribution is balanced
For example, the data distribution is not uniform, there are extreme values appear
Df.boxplot (column= ' label 1 ', by = ' Label 2 ')
Plt.show ()
The data under label 1 can then be plotted in a numerical distribution according to label 2
As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtained
Note: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another line, condition: import Matplotlib.pyplot as Plt
Data analysis Essays (Python and Pandas and Matplotlib view data)