Update in ...
This article is the author read the "Python Data analysis and Mining practice" (Zhang Liang, January 2016 1th edition, Mechanical Industry Press), several data charts of Python writing a note.
The source code comes from the book and the comments come from the author's understanding . In order to facilitate later use, the corrections or changes to the source code are not explained separately , thank you for your support.
Preparation : requires python2.7, pandas, NumPy, matplotlib, or other languages or data analysis libraries, or directly install Anaconda( You do not need to install Python when installing anaconda, otherwise you cannot import the Anaconda compute library directly by using a self-installing Python .
Tips: It is recommended to use Anaconda's Jupyter Notebook or Spyder to perform the following operations, which are measured by the command line and the Vscode import package.
First, the box pattern
This is a data to the four equal, based on the data size, take 25% positions of the number of the next four Qu, take 75% for the Senior Division QL, define the data set outliers are those less than QL-1.5IQR or greater than Qu + 1.5IQR number. Where IQR is the absolute value of the difference between the four points defined above.
1 #-*-coding:utf-8-*-2 3 ImportPandas as PD#Import the Pandas library for data analysis4 5Data_path ='Data.xls' #take an Excel file as an example6 7 " "8 The following uses Read_excel () to read an Excel file and get a column of data named "Column Name", in front of you to display Chinese to avoid garbled.
This function can have many parameters, see the Official manual. Data is the Dataframe type9 " "Tendata = Pd.read_excel (data_path, Index_col = u'Column Name') One A - ImportMatplotlib.pyplot as Plt#import matplotlib, for drawing -plt.rcparams['Font.sans-serif'] = ['Simhei']#Specifies that the font is black, matplotlib does not support direct display of Chinese theplt.rcparams['Axes.unicode_minus'] = False#causes the matplotlib to display the negative sign normally - -Plt.figure ()#Create an image - + " " - The following establishes a box chart, specifying that the return value is ' Dict ', at which point P is a dictionary, where the value of the ' fliers ' key is a list of Line2D objects that are outliers and belong to Matplotlib.
Use Get_xdata and get_ydata to obtain the horizontal ordinate array of the data (exactly Numpy.ndarray) + " " Ap = data.boxplot (return_type ='Dict') atx = p['Fliers'][0].get_xdata () -y = p['Fliers'][0].get_ydata () -Y.sort ()#Sort - - - " " in next use Annotate () to add a comment to the image, the syntax is annotate (U ' tag ', xy = (cor_x, xor_y), Xytext = (cor_x, cor_y)),
Where XY represents the callout point coordinates, Xytext represents the annotation coordinates, and cor_x and cor_y are coordinate values.
Coordinates need to be adjusted according to the data, and no code is posted here. - " "
The following box charts are obtained from the data and code in the book:
The exception value is obvious.
Python notation for several data charts