This article source: https://www.dataquest.io/mission/132/data-visualization-and-exploration
This data source Https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv
This article mainly describes how to simply explore the relationship between the data
Raw data Presentation (This is a salary survey report for college graduates, important fields have these, Major-professional name, Major_category-Professional category, Sample_size-sample size, sharewomen-female weight, total-the total person of the profession Number of
Import= pd.read_csv ('recent-grads.csv')
Histogram
To make a histogram, first divide the range of values of the x-axis into multiple intervals, then count the number of values contained in each interval, and then use that number as the y-axis value. Use the method pandas. Dataframe.hist () function
# Create a histogram of median wage income (median column) recent_grads.hist (' Median ')
# The hist () function is automatically divided into 10 equal portions by default, and the resulting graph has gridlines, which are now divided into 20 equal portions, eliminating grid lines recent_grads.hist ('Median', bins=20, Grid=false)
# In fact, you can make multiple histograms at once, the layout parameter means to divide two graphs into two rows and one column, if there is no parameter, the default will be all the graph on the same line = ['Median','sample_size']recent_grads.hist ( Column=columns, layout= (2,1), Grid=false)
Box-type diagram
The box chart is a graphical summary of the data based on the five-number generalization (minimum, first four-digit, first four-digit (median), third four-digit, maximum), and also uses four-bit spacing IQR = Third four-digit-the first four-bit number. For more information, please Google
The box diagram is made using pandas. Dataframe.boxplot () method
ImportMatplotlib.pyplot as Plt#Select two columns of datasample_size= recent_grads[['sample_size','major_category']]#according to each type of professional classification statisticsSample_size.boxplot ( by='major_category')#rotates the coordinate text of the x-axis 90 degrees, showing verticallyplt.xticks (Rotation=90)
Multi-image merging
To find out the correlations between multiple variables, you would compare the changes of multiple variables on the same graph.
#Place two scatter plots together (color-coded) to see if the associatedImportMatplotlib.pyplot as Pltplt.scatter (recent_grads['unemployment_rate'], recent_grads['Median'], color='Red') Plt.scatter (recent_grads['Sharewomen'], recent_grads['Median'], color='Blue') plt.show ()
Visualization of data (ii)