Matplotlib is the most used 2D graphics drawing library in Python. Learning the usage of Matplotlib can help us display the status of various data more flexibly in statistical analysis. This is a must-have library for data visualization. Learn it.
It supports the following data visualization methods
This article will explain Matplotlib based on these methods.
Install and start
You can use pip to install Matplotlib. If you want to use it in ipython, you need to start it with the $ipython --matplotlib command. There is nothing to say here, ever.
before the start
Let's use a line chart example to let you know about the common usage of Matplotlib.
In [1]: import matplotlib.pyplot as plt
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: df = pd.DataFrame(np.random.rand(10,2), columns=['A','B'])
In [5]: fig = df.plot(figsize=(8,4)) # plot function generates the original drawing, figsize sets the window size
In [6]: plt.title('Title') # Set the drawing title name
In [7]: plt.xlabel('X axis') # Set the X axis name
In [8]: plt.ylabel('Y axis') # Set the Y axis name
In [9]: plt.legend(loc='upper right') # Display the name label of the polyline, loc sets the display position
In [10]: plt.xlim([0,12]) # Set the x-axis boundary
In [11]: plt.ylim([0,1.5]) # Set the y-axis boundary
In [12]: plt.xticks(range(10)) # Set the scale of the X axis
In [13]: plt.yticks(np.linspace(0,1.2,7)) # Set the scale of the Y axis
In [14]: fig.set_xticklabels('%.1f' %i for i in range(10)) # Set labels on the X-axis scale
In [16]: fig.set_yticklabels('%.2f' %i for i in np.linspace(0,1.2,7)) # set the labels on the Y-axis scale
# Note that the X-axis has a boundary range of 0-12, but the scale is only 0-9, and the scale label shows 1 decimal place
The above simple operations will help you have a general understanding of Matplotlib, and then you can start learning formal drawing work.
line chart
We have already briefly drawn a line chart above. Want to add a richer style to the line chart? no problem
In [9]: from matplotlib import style
In [10]: style.use('ggplot')
In [11]: x = [5,8,10]
In [12]: y = [12,16,6]
In [13]: x1 = [6,9,11]
In [14]: y1 = [6,15,7]
In [15]: plt.plot(x,y,'g',label='line one',linewidth=5) # The third parameter specifies the color, the fourth specifies the name of the line, and the fifth parameter specifies the line width
In [16]: plt.plot(x1,y1,'r',label='line two',linewidth=5)
In [17]: plt.title('Epic Info')
In [18]: plt.xlabel('X axis')
In [19]: plt.ylabel('Y axis')
In [20]: plt.legend()
In [21]: plt.grid(True,color='k') #Turn on the display background grid line and set the color
Here you can see that without the help of other functions, just adding a few parameters to the plot function makes the graph look like that. In fact, I want to modify the style of the graph, mainly by adding various parameters to the plot function.
There are actually a lot of parameters for the plot function. Here are just a few of the more commonly used ones. details as follows:
plt.plot(kind='line', ax=None, figsize=None, use_index=True, title=None, grid=None, legend=False, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, label=None, secondary_y=False, ** kwds)
series index is the abscissa
value is the ordinate
kind → line,bar,barh... (line chart, bar chart, bar chart-horizontal...)
label → Legend label, Dataframe format with column name label
style → style string, here includes linestyle (-), marker (.), color (g)
color → color, when color is specified, the color will prevail
alpha → transparency, 0-1
use_index → Use index as tick label, default is True
rot → Rotate the scale label, 0-360
grid → bool variable to display the background grid, generally directly use plt.grid function
xlim,ylim → x,y axis limit
xticks,yticks → x,y scale value
colormap → Specify color set
figsize → image size
title → Image name
legend → Whether to display the legend, generally use plt.legend() directly
Students with strong hands-on ability can try to modify the parameters themselves to see the effect
In [22]: df = pd.DataFrame(np.random.randn(30, 4), columns=list('ABCD')).cumsum()
In [23]: draw = df.plot(style ='--.',alpha = 0.8,colormap ='summer_r',grid=True)
In [24]: draw.text(20,5,'(20,5)',fontsize=12) # Set the annotation label, the parameters are x-axis coordinates, y-axis coordinates, text content, font size in this order
After learning this, the drawing of the line chart can basically be done.
Bar graph
Bar charts are often used for graphical display of multi-segment data comparison. Let's see how to use them
In [27]: plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],label="BMW", color='b', width=.5) # plt.bar is the function to print the histogram. The first parameter specifies the start of each column, the second parameter specifies the height of each column, the third is the name label, the fourth specifies the color, and the fifth Specified width
In [28]: plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],label="Audi", color='r',width=.5 )
In [29]: plt.legend()
In [30]: plt.xlabel('Day')
In [31]: plt.ylabel('Distance(kms)')
In [32]: plt.title('Information')
The style of the bar graph is mainly set in the bar function, and the main parameters are as follows:
plt.bar(x,y,width,facecolor,left,align,xerr,yerr)
x,y parameters: x,y values
width: width ratio
facecolor: fill color in the histogram, edgecolor is the color of the border
left: the x-axis left boundary of each column
bottom: the lower boundary of the y-axis of each column
align: determines the distribution of the entire bar graph. The default left indicates that the default is to start drawing from the left border.
xerr/yerr: x/y direction error bar
Histogram
Histograms are basically similar to bar charts, but histograms are usually used to describe a single attribute of a single data, not for comparison
In [33]: population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54, 44,43,42,48]
In [34]: bins = [0,10,20,30,40,50,60,70,80,90,100]
In [35]: plt.hist(population_age, bins, histtype='bar', color='b', rwidth=0.8)
In [36]: plt.hist(population_age, bins, histtype='bar', color='b', rwidth=0.8)
In [37]: plt.xlabel('age groups')
In [38]: plt.ylabel('Number of people')
In [39]: plt.title('Histogram')
The histogram is mainly realized by means of the hist function, and the parameters are as follows:
plt.hist(x, bins=10, range=None, normed=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical',rwidth= None,
log=False, color=None, label=None, stacked=False, hold=None, data=None, **kwargs)
bins: the width of the box
normed
histtype style, bar, barstacked, step, stepfilled
orientation horizontal or vertical{‘horizontal’, ‘vertical’}
align: {‘left’, ‘mid’, ‘right’}, optional (alignment)
stacked: whether to stack