Python data analysis data standardization and discretization details, python Data Analysis
This article shares with you the specific content of standardization and discretization of python data analysis data for your reference. The specific content is as follows:
Standardization
1. Deviation Standardization
Is a linear transformation of the original data, so that the results are mapped to the [0, 1] range. This facilitates data processing. Eliminate the influence of unit and variation size.
The basic formula is:
x'=(x-min)/(max-min)
Code:
#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # deviation standardization data1 = (data-data.min ()/(data. max ()-data. min () print (data1)
Running result
2. Standard Deviation Standardization
Eliminate the influence of unit and variable variation. (Zero-mean Standardization)
The basic formula is:
X' = (x-mean)/Standard Deviation
Python code:
#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # standard deviation standardized data1 = (data-data.mean ()/data. std () print (data1)
Running result:
3. Decimal calibration Standardization
Eliminate unit impact
The basic formula is:
Where j = lg (max (| x |), that is, the logarithm of the absolute value of x at the bottom of 10
x' = x/10^j
Implementation Code:
#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # standard deviation standardization j = np. ceil (np. log10 (data. abs (). max () # Take an integer, abs () is take absolute value data1 = data/10 ** jprint (data1)
Result:
Discretization
Discretization is a common technique in programming. It can effectively reduce the time complexity. The basic idea is to consider only the values that need to be used in many possible cases. Discretization can improve an inefficient algorithm or even implement an algorithm that is impossible at all.
1. Same Width discretization
Discretization of continuous data according to the same wide interval standard. One of the advantages is that the data processed is limited data rather than infinite data.
Use the pandas cut method. For non-equal width, you only need to change the second parameter of cut. For example, the second parameter is [1,100,300, 200000,], which is divided into four intervals.
#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # discretization data1 = data ['price']. t. values # obtain the price of a one-dimensional array lable = ['low', 'low', 'zhong', 'high', 'high'] data2 = pd. cut (data1, 5, labels = lable) print (data2)
Execution result:
2. Same Frequency discretization
Put the same amount of data into a single interval.
3. discretization of one-dimensional Clustering
Perform clustering discretization on data by attributes.
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.