Python data analysis data standardization and discretization details, python Data Analysis

Source: Internet
Author: User

Python data analysis data standardization and discretization details, python Data Analysis

This article shares with you the specific content of standardization and discretization of python data analysis data for your reference. The specific content is as follows:

Standardization

1. Deviation Standardization

Is a linear transformation of the original data, so that the results are mapped to the [0, 1] range. This facilitates data processing. Eliminate the influence of unit and variation size.
The basic formula is:

x'=(x-min)/(max-min)

Code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # deviation standardization data1 = (data-data.min ()/(data. max ()-data. min () print (data1)

Running result

2. Standard Deviation Standardization

Eliminate the influence of unit and variable variation. (Zero-mean Standardization)
The basic formula is:

X' = (x-mean)/Standard Deviation

Python code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # Get data # standard deviation standardized data1 = (data-data.mean ()/data. std () print (data1)

Running result:

3. Decimal calibration Standardization

Eliminate unit impact
The basic formula is:
Where j = lg (max (| x |), that is, the logarithm of the absolute value of x at the bottom of 10

x' = x/10^j

Implementation Code:

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # standard deviation standardization j = np. ceil (np. log10 (data. abs (). max () # Take an integer, abs () is take absolute value data1 = data/10 ** jprint (data1)

Result:

Discretization

Discretization is a common technique in programming. It can effectively reduce the time complexity. The basic idea is to consider only the values that need to be used in many possible cases. Discretization can improve an inefficient algorithm or even implement an algorithm that is impossible at all.

1. Same Width discretization

Discretization of continuous data according to the same wide interval standard. One of the advantages is that the data processed is limited data rather than infinite data.
Use the pandas cut method. For non-equal width, you only need to change the second parameter of cut. For example, the second parameter is [1,100,300, 200000,], which is divided into four intervals.

#! /User/bin/env python #-*-coding: UTF-8-*-# author: M10import numpy as npimport pandas as pdimport matplotlib. pylab as pltimport mysql. connectorconn = mysql. connector. connect (host = 'localhost', user = 'root', passwd = '000000', db = 'python') # connect to the local database SQL = 'select price, comment from taob' # SQL statement data = pd. read_ SQL (SQL, conn) # obtain data # discretization data1 = data ['price']. t. values # obtain the price of a one-dimensional array lable = ['low', 'low', 'zhong', 'high', 'high'] data2 = pd. cut (data1, 5, labels = lable) print (data2)

Execution result:

2. Same Frequency discretization

Put the same amount of data into a single interval.

3. discretization of one-dimensional Clustering

Perform clustering discretization on data by attributes.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.