Merger of Dataframe (Append, merge, concat)

Last Update:2018-08-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1,pd.concat: Stitching
1.1,axis
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' A ', ' B ', ' C ', ' d '])
DF3 = PD. DataFrame (Np.ones ((3,4)) * *, columns = [' A ', ' B ', ' C ', ' d '])
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
A B c D
0 1.0 1.0) 1.0 1.0
1 1.0 1.0) 1.0 1.0
2 1.0 1.0) 1.0 1.0
A B c D
0 2.0 2.0) 2.0 2.0
1 2.0 2.0) 2.0 2.0
2 2.0 2.0 2.0 2.0 result = Pd.concat ([Df1, DF2, df3], axis = 0) # 0 represents a vertical merge
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
0 1.0 1.0) 1.0 1.0
1 1.0 1.0) 1.0 1.0
2 1.0 1.0) 1.0 1.0
0 2.0 2.0) 2.0 2.0
1 2.0 2.0) 2.0 2.0
2 2.0 2.0 2.0 2.0 Result=pd.concat ([df1,df2,df3], Axis=1) # 1 indicates a horizontal merge
A b c d a b c d a B c D
0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
1 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 1.2,ignore_index
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' A ', ' B ', ' C ', ' d '])
DF3 = PD. DataFrame (Np.ones ((3,4)) * *, columns = [' A ', ' B ', ' C ', ' d '])
result = Pd.concat ([Df1, DF2, df3], axis = 0, ignore_index=true) # ignore_index=true means ignoring the original index (axis=0)/column (Axis=1)
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
3 1.0 1.0) 1.0 1.0
4 1.0 1.0) 1.0 1.0
5 1.0 1.0) 1.0 1.0
6 2.0 2.0) 2.0 2.0
7 2.0 2.0) 2.0 2.0
8 2.0 2.0 2.0 2.0 result = Pd.concat ([Df1, DF2, df3], axis = 0, ignore_index=true) # Ignore_index=true indicates that the original index is ignored
0 1 2 3 4 5 6 7 8 9 10 11
0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
1 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 1.3,join
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '])
Result=pd.concat ([DF1,DF2], axis=0, join= ' inner ', ignore_index=true) # join= ' inner ' means only the same part of column name is reserved
b c D
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
5 1.0 1.0 1.0 1.4, Join_axes
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '], Index=[1, 2, 3])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4]) result=pd.concat ([DF1,DF2], Axis=1)
A b c D b c D E
1 0.0 0.0 0.0 0.0 nan nan nan nan
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
4 nan Nan Nan Nan 1.0 1.0 1.0 1.0 Result=pd.concat ([DF1,DF2], Axis=1, Join_axes=[df1.index]) # join_axes=[df1.in Dex] means to merge according to the index of DF1
A b c D b c D E
1 0.0 0.0 0.0 0.0 nan nan nan nan
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2,pd.append ()
2.1,append
Pd.append only vertical yearning to add data, no axis properties
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '], Index=[1, 2, 3])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4])
DF3 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4]) df1 = Df1.append (DF2)
A b c d E
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 0.0 0.0 0.0 0.0 NaN
2 NaN 1.0 1.0 1.0 1.0
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0 DF1 = Df1.append ([DF2, Df3], ignore_index=true) # You can add more than one DF at a time, or you can ignore index
A b c d E
0 0.0 0.0 0.0 0.0 NaN
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0
5 NaN 1.0 1.0 1.0 1.0
6 NaN 2.0 2.0 2.0 2.0
7 NaN 2.0 2.0 2.0 2.0
8 NaN 2.0 2.0 2.0 2.0 S1=PD. Series ([1,2,3,4],index=[' A ', ' B ', ' C ', ' d '])
DF1 = Df1.append (S1, ignore_index=true) # can be added series,ignore_index=true must be added, the number of S1 must be the same as the number of columns DF1
A B c D
0 0.0 0.0 0.0 0.0 # Note Ignore_index = True, the index of DF1 becomes starting from 0!!!
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
3 1.0 2.0 3.0 4.0 l1=[1,2,3,4]
Df1.loc[4] = L1
A B c D
1 0.0 0.0 0.0 0.0 # DF1 Index or according to the original starting from 1
2 0.0 0.0) 0.0 0.0
3 0.0 0.0) 0.0 0.0
4 1.0 2.0) 3.0 4.0 2.2,append Example
DF1 = PD. DataFrame ({' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ', ' B4 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ', ' D4 '],
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 ', ' A4 '],
' E ': [' E0 ', ' E1 ', ' E2 ', ' E3 ', ' E4 ']},
Index=[0, 1, 2, 3,4],columns=[' B ', ' D ', ' A ', ' E ']) DF2 = PD. DataFrame ({' F ': [' F4 ', ' F5 ', ' F6 ', ' F7 ', ' F8 '],
' A ': [' A4 ', ' A5 ', ' A6 ', ' A7 ', ' A8 '],
' B ': [' B4 ', ' B5 ', ' B6 ', ' B7 ', ' B8 '],
' C ': [' C4 ', ' C5 ', ' C6 ', ' C7 ', ' C8 ']},
Index=[5, 9, 6, 7,10]) after entering Df3=df1.append (DF2), DF3 becomes:
A B C D E F
0 A0 B0 nan D0 E0 nan
1 A1 B1 nan D1 E1 nan
2 A2 B2 nan D2 E2 nan
3 A3 B3 nan D3 E3 nan
4 A4 B4 nan D4 E4 nan
5 A4 B4 C4 nan nan F4
9 A5 B5 C5 nan nan F5
6 A6 B6 C6 nan nan F6
7 A7 B7 C7 nan nan F7
Ten A8 B8 C8 nan nan F8

The column items are rearranged, and if you want the column items to follow df1, that is, the order of DF1, and if the DF1 has no columns, throw them away: Df4=pd.concat ([df1,df2],axis=0)
Df4.reindex (columns=df1.columns)
3,pd.merge # Merging
3.1, 1 keys
LEFT=PD. DataFrame ({' key ': [' K0 ', ' K1 ', ' K2 ', ' K3 '),
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 '],
' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ']})
RIGHT=PD. DataFrame ({' key ': [' K0 ', ' K1 ', ' K2 ', ' K3 '),
' C ': [' C0 ', ' C1 ', ' C2 ', ' C3 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ']}) Res=pd.merge (Left,right)
Res=pd.merge (left,right,on=[' key ') # default automatically selects a common column for merging, and if no common column will be an error
A B Key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
2 A2 B2 K2 C2 D2
3 A3 B3 K3 C3 D3 3.2, multiple key,how preferable [' left ', ' right ', ' inner ', ' outer ']
LEFT=PD. DataFrame ({' Key1 ': [' K0 ', ' K0 ', ' K1 ', ' K2 '),
' Key2 ': [' K0 ', ' K1 ', ' K0 ', ' K1 '],
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 '],
' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ']})
RIGHT=PD. DataFrame ({' Key1 ': [' K0 ', ' K1 ', ' K1 ', ' K2 '),
' Key2 ': [' K0 ', ' K0 ', ' K0 ', ' K0 '],
' C ': [' C0 ', ' C1 ', ' C2 ', ' C3 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ']})
A B Key1 Key2
0 A0 B0 K0 K0
1 A1 B1 K0 K1
2 A2 B2 K1 K0
3 A3 B3 K2 K1
C D Key1 Key2
0 C0 D0 K0 K0
1 C1 D1 K1 K0
2 C2 D2 K1 K0
3 C3 D3 K2 K0 res=pd.merge (left,right,on=[' key1 ', ' Key2 ']) # need to be key1,key2 exactly the same before merging, left there is a row k1/k0,right there are two lines k1/k0, then Print 2 times
Res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' inner ') # default is how= ' inner ', that is, the value of key is the same
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2 res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' outer ') # All merges according to Key1,key2 value, if left or right is not Some take Nan
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 nan nan
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 nan nan
5 nan nan K2 K0 C3 D3 res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' left ') # 2 rows of k1k0 are repeated as a result of the left key merge (because RI Ght has 2 rows)
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 nan nan
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 nan nan res=pd.merge (left,right,on=[' key1 ', ' key2 '],how= ' right ') # Merge according to right key
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2
3 nan nan K2 K0 C3 D3 3.3, indicator
The default false,true indicates whether the merge is displayed, left or right, or both

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Merger of Dataframe (Append, merge, concat)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Merger of Dataframe (Append, merge, concat)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support