Data merging, conversion, filtering, and sorting for python data cleansing

Source: Internet
Author: User
This article mainly introduces the data merging, conversion, filtering, and sorting of python Data Cleansing. For more information, see pandas, next, we will learn more about data operations,

Data cleansing has always been an extremely important part of data analysis.

Data merging

In pandas, you can use merge to merge data.

import numpy as npimport pandas as pddata1 = pd.DataFrame({'level':['a','b','c','d'],         'numeber':[1,3,5,7]})data2=pd.DataFrame({'level':['a','b','c','e'],         'numeber':[2,3,6,10]})print(data1)

Result:

In addition, connections such as outer, ringt, and left are represented by the keyword "how.

data3 = pd.DataFrame({'level1':['a','b','c','d'],         'numeber1':[1,3,5,7]})data4=pd.DataFrame({'level2':['a','b','c','e'],         'numeber2':[2,3,6,10]})print(pd.merge(data3,data4,left_on='level1',right_on='level2'))

Result:

Merge overlapping data

Sometimes we may encounter overlapping data that needs to be merged. in this case, we can use the comebine_first function.

data3 = pd.DataFrame({'level':['a','b','c','d'],         'numeber1':[1,3,5,np.nan]}) data4=pd.DataFrame({'level':['a','b','c','e'],         'numeber2':[2,np.nan,6,10]}) print(data3.combine_first(data4))

Result:

The usage here is similar to np. where (isnull (a), B,)

Data remodeling and axial rotation

This content is mentioned in the previous pandas article. Data remodeling mainly uses the reshape function, while rotation mainly uses the unstack and stack functions.

data=pd.DataFrame(np.arange(12).reshape(3,4),       columns=['a','b','c','d'],       index=['wang','li','zhang'])print(data)

Result:

Data Conversion

Delete duplicate row data

data=pd.DataFrame({'a':[1,3,3,4],       'b':[1,3,3,5]})print(data)

Result:

Replacement value

In addition to the fillna method mentioned in the previous article, you can also use the replace method, which is simpler and faster.

data=pd.DataFrame({'a':[1,3,3,4],       'b':[1,3,3,5]})print(data.replace(1,2))

Result:

data=[11,15,18,20,25,26,27,24]bins=[15,20,25]print(data)print(pd.cut(data,bins))

Result:
[11, 15, 18, 20, 25, 26, 27, 24] [NaN, NaN, (15, 20], (15, 20], (20, 25], naN, NaN, (20, 25]
Categories (2, object): [(15, 20] <(20, 25]

We can see the results after Segmentation. The data not in the segmentation is displayed as the na value, and other data is displayed as the segmentation.

print(pd.cut(data,bins).labels)

Result:

[-1-1 0 0 1-1-1 1]

Display the segmented sorting label

print(pd.cut(data,bins).levels)

Result:

Index (['(15, 20]', '(20, 25]'], dtype = 'object ')

Display the segmentation label

print(value_counts(pd.cut(data,bins)))

Result:

Now we want to talk about random sorting of data (permutation)

data=np.random.permutation(5)print(data)

Result:

[1 0 4 2 3]

Here, the peemutation function sorts 0-4 data randomly.
You can also sample the data.

df=pd.DataFrame(np.arange(12).reshape(4,3))samp=np.random.permutation(3)print(df)

Result:

Here, the result of taking is that samples are extracted from df in the samp order.

For more articles about data merging, conversion, filtering, and sorting in python data cleansing, please follow the PHP Chinese website!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.