Python Data visualization-Create a scatter plot using matplotlib

Source: Internet
Author: User

Matplotlib Brief Description: Matplotlib is a desktop drawing package for creating high-quality charts (mainly 2D). The project was launched by John Hunter in 2002 to build a MATLAB-style drawing interface for Python. If you use a Python IDE, such as Pycharm,matplotlib, you also have interactive features such as zoom and pan. It not only supports many different GUI backend on various operating systems, but also can export images to various common vector (vectors) and raster (raster) Graphs: PDF, SVG, JPG, PNG, BMP, GIF, etc. In addition, Matplotlib has a number of plug-in toolset, such as Mplot3d for 3D graphics and Basemap for maps and projections.  Prepare data: Parse data from a text file (data derived from "machine Learning Combat" chapter II K proximity algorithm) datingTestSet2.txt file: HTTPS://PAN.BAIDU.COM/S/1PLWZRSV The data used in this article mainly contains the following three characteristics: Frequent flyer mileage per year, percentage of time spent playing video games, and number of ice cream litres consumed per week. The result of the classification is the fourth column of the file, and there are only 3, 2, 13 classification values. The Datingtestset2.csv file format is as follows:
Number of miles flown Game time-consuming percentage Number of ice cream litres Classification results
40920 8.326976 0.953952 3
14488 7.153469 1.673904 2
26052 1.441871 0.805124 1
...... ...... ...... ......

The format of the data in the DatingTestSet2.txt file is as follows:

The format of the above characteristic data can be output as matrix and class label vectors after the File2matrix function parsing process. Convert the text record to a numpy resolver and save the following code in knn.py:
From NumPy import *def file2matrix (filename):    fr = open (filename)    numberoflines = Len (Fr.readlines ()) # Get the Nu Mber of lines in the "file    Returnmat = Zeros ((NumberOfLines, 3)) # Prepare matrix to return    classlabelvector = []
   # Prepare labels return    FR = open (filename)    index = 0 for line in    Fr.readlines (): Line        = Line.strip () 
   listfromline = Line.split (' \ t ')        returnmat[index,:] = Listfromline[0:3]        classlabelvector.append (int ( LISTFROMLINE[-1])        index + = 1    return Returnmat, Classlabelvector

  

To read file data using File2matrix, you must ensure that the file to be parsed is stored in the current working directory. After importing the data, simply check the data format:

>>>import knn>>>datingdatamat,datinglabels = Knn.file2matrix (' datingTestSet2.txt ') >>> Datingdatamat[0:6]array ([[  4.09200000e+04,   8.32697600e+00,   9.53952000e-01],       [  1.44880000e+ Geneva,   7.15346900e+00,   1.67390400e+00],       [  2.60520000e+04,   1.44187100e+00,   8.05124000e-01 ],       [  7.51360000e+04,   1.31473940e+01,   4.28964000e-01],       [  3.83440000e+04,   1.66978800e+00,   1.34296000e-01],       [  7.29930000e+04,   1.01417400e+01,   1.03295500e+00]]) >>> datinglabels[0:6][3, 2, 1, 1, 1, 1]

Analyze data: Create scatter plots with matplotlib

Edit the knn.py file, introduce matplotlib, and call Matplotlib's scatter to draw a scatter plot.
>>> Import matplotlib>>> Import matplotlib.pyplot as plt>>> fig = plt.figure () >>> ax = Fig.add_subplot (111) >>> Ax.scatter (datingdatamat[:,1],datingdatamat[:,2]) < Matplotlib.collections.PathCollection object at 0x0000019e14c9a470>>>> plt.show () >>>

The resulting scatter plot is as follows:

The scatter plot uses the second and third columns of the Datingdatamat matrix, representing the eigenvalues "percentage of time spent playing video games" and "The number of ice cream litres consumed per week". knn.py complete code is as follows:
Import Matplotlibimport NumPy as Npfrom numpy import *from matplotlib import pyplot as Plt def file2matrix (filename):    FR = open (filename)    numberoflines = Len (Fr.readlines ()) # Get the number of lines in the  file    Returnmat = zero S ((NumberOfLines, 3))  # Prepare matrix to return    classlabelvector = []  # Prepare labels return    FR = open (f Ilename)    index = 0 for line in    Fr.readlines (): Line        = Line.strip ()        listfromline = Line.split (' \ t ') C12/>returnmat[index,:] = Listfromline[0:3]        classlabelvector.append (int (listfromline[-1]))        Index + = 1    return Returnmat, Classlabelvector datingdatamat,datinglabels = File2matrix (' datingTestSet2.txt ') FIG = Plt.figure () ax = Plt.subplot (111) Ax.scatter (datingdatamat[:,1],datingdatamat[:,2]) plt.show ()

It is difficult to see any useful data schema information because the eigenvalues of the sample classification are not used. To better understand the data information, the scatter function provided by the Matplotlib library supports personalizing points on the scatter plot. Call the scatter function with the following parameters:

Ax.scatter (Datingdatamat[:,1],datingdatamat[:,2],15.0*array (datinglabels), 15.0*array (DatingLabels))

The resulting scatter plot is as follows:

Using the class tag attribute stored by datinglabels, the dots with different colors and sizes are plotted on the scatter plot. Thus, you can basically see the area outline of the three sample categories that the data points belong to. For better results, the Datingdatamat matrix attribute columns 1 and 2 show the data, and the red ' * ' represents the class label 1, the Blue ' O ' represents the class label 2, the Green ' + ' for the class label 3, modify the parameters as follows:
Import Matplotlibimport NumPy as Npfrom numpy import *from matplotlib import pyplot as Pltfrom Matplotlib.font_manager Imp ORT fontproperties def file2matrix (filename): FR = open (filename) numberoflines = Len (Fr.readlines ()) # Get the Nu  Mber of lines in the "File Returnmat = Zeros ((NumberOfLines, 3)) # Prepare matrix to return classlabelvector = [] #        Prepare labels return FR = open (filename) index = 0 for line in Fr.readlines (): line = Line.strip () Listfromline = Line.split (' \ t ') Returnmat[index,:] = Listfromline[0:3] classlabelvector.append (int (listFr OMLINE[-1]) Index + = 1 return returnmat, Classlabelvectorzhfont = fontproperties (fname= ' C:/windows/fonts/simsun . TTC ', size=12) datingdatamat,datinglabels = File2matrix (' datingTestSet2.txt ') FIG = Plt.figure () plt.figure (figsize= ( 8, 5), dpi=80) ax = plt.subplot (111) datinglabels = Np.array (datinglabels) idx_1 = Np.where (datinglabels==1) p1 = Ax.scatter (Datingdatamat[idx_1,0],datingdatAmat[idx_1,1],marker = ' * ', color = ' R ', label= ' 1 ', s=10) idx_2 = Np.where (datinglabels==2) P2 = ax.scatter (datingdatamat[ Idx_2,0],datingdatamat[idx_2,1],marker = ' o ', color = ' g ', label= ' 2 ', s=20) idx_3 = Np.where (datinglabels==3) P3 = Ax.scatter (datingdatamat[idx_3,0],datingdatamat[idx_3,1],marker = ' + ', color = ' B ', label= ' 3 ', s=30) plt.xlabel (U ' Number of miles earned per year ', Fontproperties=zhfont ' Plt.ylabel (the percentage of events consumed by U ' playing video games ', Fontproperties=zhfont) ax.legend ((P1, P2, p3), (U ' Don't like ', U ' charm General ', U ' very attractive '), loc=2, Prop=zhfont) plt.show ()

The resulting scatter plot is as follows:

The second method:

Import matplotlibfrom matplotlib import pyplot as Pltfrom matplotlib import Font_manager def file2matrix (filename): fr = open (filename) numberoflines = Len (Fr.readlines ()) # Get the number of lines in the file Returnmat = Zeros ((numbe Roflines, 3)) # Prepare matrix to return classlabelvector = [] # prepare labels return FR = open (filename) Inde x = 0 for line in Fr.readlines (): line = Line.strip () Listfromline = Line.split (' \ t ') returnmat[in Dex,:] = Listfromline[0:3] classlabelvector.append (int (listfromline[-1))) Index + = 1 return Returnmat, C Lasslabelvectormatrix, labels = file2matrix (' datingTestSet2.txt ') Zhfont = Matplotlib.font_manager. Fontproperties (fname= ' C:/WINDOWS/FONTS/SIMSUN.TTC ', size=12) plt.figure (figsize= (8, 5), dpi=80) axes = Plt.subplot ( 111) # Three types of data are taken out separately # X axis represents the number of miles flown # Y axis represents the percentage of playing video games type1_x = []type1_y = []type2_x = []type2_y = []type3_x = []type3_y = []for i I       N Range (len (labels)): if labels[i] = = 1: # Not like Type1_x.append (Matrix[i][0]) type1_y.append (matrix[i][1]) if labels[i] = = 2: # Glamour General Type2_x.append (Mat Rix[i][0]) type2_y.append (matrix[i][1]) if labels[i] = = 3: # Extremely attractive #print (i, ': ', labels[i], ': ', type (  Labels[i]) type3_x.append (matrix[i][0]) type3_y.append (matrix[i][1]) type1 = Axes.scatter (type1_x, Type1_y, s=20, c= ' red ') type2 = Axes.scatter (type2_x, type2_y, s=40, c= ' green ') Type3 = Axes.scatter (type3_x, type3_y, s=50, c= ' Blue ') Plt.xlabel (the number of miles you earn per year ', Fontproperties=zhfont) Plt.ylabel (percentage of events consumed by you ' playing video games ', Fontproperties=zhfont) Axes.legend ((Type1, type2, Type3), (U ' dislike ', U ' charm General ', U ' very attractive '), loc=2, Prop=zhfont) plt.show ()

The resulting scatter plot is as follows:

 

Summary: This paper briefly introduces Matplotlib, and analyzes how to use Matplotlib library to display the data graphically, finally, by modifying the scatter function parameters of matplotlib, the classification region of scatter plots is clearer. Additional knowledge points: 1, when using matplotlib to generate a chart, the default does not support Chinese characters, all Chinese characters will be displayed as a box. Workaround: Specify the Chinese font in the code
#-*-Coding:utf-8-*-import matplotlib.pyplot as Pltimport matplotlibzhfont1 = Matplotlib.font_manager. Fontproperties (fname= ' C:/WINDOWS/FONTS/SIMSUN.TTC ') plt.xlabel (u "Horizontal Xlabel", fontproperties=zhfont1)

Find the font file corresponding to the C:\Windows\Fonts\ in the Simsun.ttf (Window 8 and WINDOWS10 system is SIMSUN.TTC, other fonts can be used)

2. Ax = Fig.add_subplot (111) Returns the axes instance parameter one, the total number of rows in the sub-chart parameter two, the number of sub-graph columns parameter three, the sub-chart position in the figure to add axes common method

Python Data visualization-Create a scatter plot using matplotlib

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.