(large) Data processing: from TXT to data visualization

Source: Internet
Author: User
Tags arrays numeric value

Python 2.7
IDE Pycharm 5.0.3
NumPy 1.11.0
Matplotlib 1.5.1

This visualization data is provided by the machine learning Combat portrait (that is, the data is stolen and a little bit of the program is easier to read) preface

Visualization of data in TXT for analysis of requirements
you just need to know

The first column of data in each row is the flight mileage, the second column is playing the percentage of time, the third is the annual consumption of ice cream, the fourth column is a certain xx think this kind of person's interest in dating, that is, he flew 40920 kilometers a year, there are about 8% of the time playing games, Also eat 0.9 liters a year Oh, this object xx feel good attractive, very want to date with it, is this meaning. Preparation of food materials

Just download the TXT is the left one, the fourth column is the category, but how to translate the Chinese to the right of the numerical intensity. It is quite simple, scanning each line of the time to judge the equivalent of the string directly replaced it, please see the following code snippet, here alone to take a look.

#将评价转化为数字
        if listfromline[3] = = ' largedoses ':
            listfromline[3] =3
        elif listfromline[3] = = ' smalldoses ':
            listfromline[3]=2
        Else:
            listfromline[3]=1

After transformation, the form should be the same as the right one, very want to date is 3, generally 2, do not want to be 1, on the purple. This is the category. from txt to stored array arrays

I am now in touch with the data stored in TXT has (large) processing: from TXT to MySQL data preprocessing migration of Beijing one months of taxi GPS data (350G), hyperspectral data Aviris Remote sensing image data (. MAT) can also be converted to TXT, So how will the TXT data after cleaning into the array or to the database, this is the data processing follow-up work to avoid the premise, nonsense not to say, start. Full Code

#-*-Coding:utf-8-*-from numpy import * Import Matplotlib.pyplot as Plt def file2matrix (filename): FR = Open (file Name, ' r ') Arrayolines = Fr.readlines () NumberOfLines = Len (arrayolines) Returnmat = Zeros ((numberoflines,3)) # Construct all 0 arrays to store number classlabelvector = [] #开辟容器 index = 0 for line in Arrayolines: #清洗数据 line = Line.st RIP () Listfromline = Line.split (' \ t ') #将评价转化为数字 if listfromline[3] = = ' largedoses ': Li STFROMLINE[3] =3 elif listfromline[3] = = ' smalldoses ': listfromline[3]=2 else:lis Tfromline[3]=1 #存入数据到list returnmat[index,:] = Listfromline[0:3] #三个特征分别存入一行的三个列 classlabelvector. Append (int (listfromline[3])) #最后一行是类别标签 index +=1 return returnmat,classlabelvector #将喜欢强度转化为颜色 def colorof Datinglable (num): Datinglabels_rgb = [] for i in range (len (num)): If Num[i]==3:datinglabels_r
       Gb.append (' Red ') Elif num[i]==2:datinglabels_rgb.append (' green ') else:datingLabels_rgb.append (' black ') return Datinglabels_rgb datingdatamat,datinglabels = File2matrix (' C:\\users\\mrlevo\\desktop\\machine_learning_in_ Action\\ch02\\datingtestset.txt ') ################# #创建图表1 ##################### plt.figure (1) #创建图表1 ax1 = Plt.subplot (1,2,1) # in Figure 1, create a sub Chart 1 plt.title ("Original Color") Plt.xlabel (' Play game/time% ') plt.ylabel (' Ice cream cost/ Week ') Ax2 = Plt.subplot (1,2,2) # in Chart 1, create a sub figure 2 plt.title ("Improved Color") Plt.xlabel (' Play game/time% ') Plt.ylabel (' Ice cr EAM cost/week ') ################## #创建图表2 #################### plt.figure (2) #创建图表2 ax3 = Plt.subplot (2,2,1) # chart 2, create a child figure 1 p Lt.title ("Play Game & Ice Cream Cost") Plt.xlabel (' Play game/time% ') plt.ylabel (' Ice cream cost/week ') ax4 = Plt.sub Plot (2,2,2) # in Figure 2, create a sub chart 2 plt.title ("Fly Distance & Play Game") Plt.xlabel (' Fly distance/year ') Plt.ylabel (' Play game/t
IME% ') ax5 = Plt.subplot (2,2,3) # chart 2, create a child figure 2Plt.title ("Fly Distance & Ice cream Cost") Plt.xlabel (' Fly distance/year ') Plt.ylabel (' Ice cream Cost/week ') #plt. SCA Tter (X[i],y[i],marker = style, s= size radius, color = (Np.random.rand (1,3)), label = STR (i+1)) Ax1.scatter (datingdatamat[:,1), Datingdatamat[:,2],15*array (Datinglabels), 15*array (datinglabels)) #scatter散点图展示第二列和第三列数据 Ax2.scatter ( Datingdatamat[:,1],datingdatamat[:,2],s=15*array (Datinglabels), color=colorofdatinglable (DatingLabels)) # The scatter scatter chart shows the second and third columns of data, and the first 15*array (datinglabels) is used to represent different radii of different labels Ax3.scatter (datingdatamat[:,1],datingdatamat[:,2),
S=15*array (Datinglabels), color=colorofdatinglable (datinglabels), label= ' Largedoses/smalldoses/didntlike ') Ax4.scatter (Datingdatamat[:,0],datingdatamat[:,1],s=15*array (datinglabels), color=colorofdatinglable ( datinglabels), label= ' Largedoses/smalldoses/didntlike ') ax5.scatter (datingdatamat[:,0],datingdatamat[:,2],s=15* Array (datinglabels), color=colorofdatinglable (datinglabels), label= ' Largedoses/smalldoses/didntlike ') ax3.legend ( Loc= ' upper right ') ax4.Legend (loc= ' upper right ') ax5.legend (loc= ' upper right ') plt.show ()
 

The resulting effect diagram is as follows

Comparison Diagram Analysis

From the above, we can see that the dense area of the red dots and the links between the various dimensions, which do not eat ice cream, this does not seem to be the impact of the evaluation of factors, because the amount of ice cream, fully consistent with the uniform distribution. The only value is the second picture, we can see that the flight time in the 40000 or so, playing game time accounted for about 10%, XX women like this type of men, and showed great interest, and flying time is too long, or game time is very short, but not by her favor, We are here to speculate on the idea of Ms. XX, she likes the type should be more lively, well-informed, but can not travel often hope to accompany the man around her, from the flight distance to see, the appropriate travel can increase insight, will be more interesting, and too long flight time, can only show that he is either on business trip, Or the kind of people who play around the world, the uncertainty and insecurity, I think this is the reason that xx lady does not like this kind of men, and the flying time is very short, lack of understanding of the outside world, so that XX women think this kind of men lack of insight and fun, you think, a perennial stay at home, how many interesting? As the saying goes, reading books is better than traveling thousands of miles. I think it is XX lady to judge whether a person is interesting factors, and game time, too short game time, may let xx women think that men lack of humor, passion, IQ performance, after all, the game is a lot of the ability to respond to a person, emotional quotient, All aspects of the layout ability are embodied, therefore, Ms. XX thinks that a man who does not play the game, and the flight distance is very long, can only judge for a long trip outside, so do not like this kind, and she to those ' dead house ' but keep the general interest, this explains, dead curtilage also have spring ah, hahaha. Improved Code

To draw a scatter-lattice scatter, its parameter setting is like this:
#plt. Scatter (X[i],y[i],marker = style, s= radius, color = color, label = dot Note
And there's a line of code in the book, and I'm sure some of my friends think it's weird:

Ax1.scatter (Datingdatamat[:,1],datingdatamat[:,2],15*array (datinglabels), 15*array (DatingLabels)) # Scatter Scatter chart shows the second and third columns of data.

First a good understanding, 15*array (datinglabels) it to specify the scale of the point, while the second 15*array (Datinglabels) to represent the color. Sorry, I look for the scatter color parameters, I did not see this method, maybe I did not find it, if anyone knows please tell me why it can be so expressed color. What I know now is that the parameters for the color can be expressed like "R" or a numeric value of three primary colors, and they do not see a single numeric representation. So I improved a bit and wrote a colorofdatinglable function that converts values to color values. The statement used is this

Ax2.scatter (Datingdatamat[:,1],datingdatamat[:,2],s=15*array (datinglabels), color=colorofdatinglable ( Datinglabels) #scatter散点图展示第二列和第三列数据, the first 15*array (datinglabels) is used to represent different radii of different labels

I also put a comparison between the two of them, as shown in the figure

In contrast, the revised diagram is clearer and features obvious. you may also need to know some common ways of numpy, see NumPy Quick Start matplotlib's common example, thank you @ Star Lighting –python Chart drawing: Getting Started with the Matplotlib drawing Gallery It comes from the basis of the K-nearest algorithm of machine learning (python description) and is only part of the processing of data matplotlib official documentation Thanks

NumPy QuickStart
@ Star Lighting –python Chart drawing: Getting Started with the Matplotlib drawing gallery
@MrLevo520 – machine learning K-Nearest-neighbor algorithm (python description) Basics
Matplotlib Official document
@ mrlevo520– (Large) Data processing: preprocessing migration from txt to MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.