Python data analysis data standardization and discretization details, python Data Analysis

Last Update:2018-03-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article shares with you the specific content of standardization and discretization of python data analysis data for your reference. The specific content is as follows:

Standardization

1. Deviation Standardization

Is a linear transformation of the original data, so that the results are mapped to the [0, 1] range. This facilitates data processing. Eliminate the influence of unit and variation size.
The basic formula is:

x'=（x-min）/（max-min）

Code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # deviation standardization data1 = (data-data.min ()/(data. max ()-data. min () print (data1)

Running result

2. Standard Deviation Standardization

Eliminate the influence of unit and variable variation. (Zero-mean Standardization)
The basic formula is:

X' = (x-mean)/Standard Deviation

Python code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # standard deviation standardized data1 = (data-data.mean ()/data. std () print (data1)

Running result:

3. Decimal calibration Standardization

Eliminate unit impact
The basic formula is:
Where j = lg (max (| x |), that is, the logarithm of the absolute value of x at the bottom of 10

x' = x/10^j

Implementation Code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # standard deviation standardization j = np. ceil (np. log10 (data. abs (). max () # Take an integer, abs () is take absolute value data1 = data/10 ** jprint (data1)

Result:

Discretization

Discretization is a common technique in programming. It can effectively reduce the time complexity. The basic idea is to consider only the values that need to be used in many possible cases. Discretization can improve an inefficient algorithm or even implement an algorithm that is impossible at all.

1. Same Width discretization

Discretization of continuous data according to the same wide interval standard. One of the advantages is that the data processed is limited data rather than infinite data.
Use the pandas cut method. For non-equal width, you only need to change the second parameter of cut. For example, the second parameter is [1,100,300, 200000,], which is divided into four intervals.

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # discretization data1 = data ['price']. t. values # obtain the price of a one-dimensional array lable = ['low', 'low', 'zhong', 'high', 'high'] data2 = pd. cut (data1, 5, labels = lable) print (data2)

Execution result:

2. Same Frequency discretization

Put the same amount of data into a single interval.

3. discretization of one-dimensional Clustering

Perform clustering discretization on data by attributes.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python data analysis data standardization and discretization details, python Data Analysis

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support