Merger of Dataframe (Append, merge, concat)

Source: Internet
Author: User

1,pd.concat: Stitching
1.1,axis
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' A ', ' B ', ' C ', ' d '])
DF3 = PD. DataFrame (Np.ones ((3,4)) * *, columns = [' A ', ' B ', ' C ', ' d '])
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
A B c D
0 1.0 1.0) 1.0 1.0
1 1.0 1.0) 1.0 1.0
2 1.0 1.0) 1.0 1.0
A B c D
0 2.0 2.0) 2.0 2.0
1 2.0 2.0) 2.0 2.0
2 2.0 2.0 2.0 2.0 result = Pd.concat ([Df1, DF2, df3], axis = 0) # 0 represents a vertical merge
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
0 1.0 1.0) 1.0 1.0
1 1.0 1.0) 1.0 1.0
2 1.0 1.0) 1.0 1.0
0 2.0 2.0) 2.0 2.0
1 2.0 2.0) 2.0 2.0
2 2.0 2.0 2.0 2.0 Result=pd.concat ([df1,df2,df3], Axis=1) # 1 indicates a horizontal merge
A b c d a b c d a B c D
0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
1 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 1.2,ignore_index
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' A ', ' B ', ' C ', ' d '])
DF3 = PD. DataFrame (Np.ones ((3,4)) * *, columns = [' A ', ' B ', ' C ', ' d '])
result = Pd.concat ([Df1, DF2, df3], axis = 0, ignore_index=true) # ignore_index=true means ignoring the original index (axis=0)/column (Axis=1)
A B c D
0 0.0 0.0) 0.0 0.0
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
3 1.0 1.0) 1.0 1.0
4 1.0 1.0) 1.0 1.0
5 1.0 1.0) 1.0 1.0
6 2.0 2.0) 2.0 2.0
7 2.0 2.0) 2.0 2.0
8 2.0 2.0 2.0 2.0 result = Pd.concat ([Df1, DF2, df3], axis = 0, ignore_index=true) # Ignore_index=true indicates that the original index is ignored
0 1 2 3 4 5 6 7 8 9 10 11
0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
1 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 1.3,join
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '])
Result=pd.concat ([DF1,DF2], axis=0, join= ' inner ', ignore_index=true) # join= ' inner ' means only the same part of column name is reserved
b c D
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
5 1.0 1.0 1.0 1.4, Join_axes
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '], Index=[1, 2, 3])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4]) result=pd.concat ([DF1,DF2], Axis=1)
A b c D b c D E
1 0.0 0.0 0.0 0.0 nan nan nan nan
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
4 nan Nan Nan Nan 1.0 1.0 1.0 1.0 Result=pd.concat ([DF1,DF2], Axis=1, Join_axes=[df1.index]) # join_axes=[df1.in Dex] means to merge according to the index of DF1
A b c D b c D E
1 0.0 0.0 0.0 0.0 nan nan nan nan
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 2,pd.append ()
2.1,append
Pd.append only vertical yearning to add data, no axis properties
DF1 = PD. DataFrame (Np.ones ((3,4)) *0, columns = [' A ', ' B ', ' C ', ' d '], Index=[1, 2, 3])
DF2 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4])
DF3 = PD. DataFrame (Np.ones (3,4) * *, columns = [' B ', ' C ', ' d ', ' e '], index=[2, 3, 4]) df1 = Df1.append (DF2)
A b c d E
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 0.0 0.0 0.0 0.0 NaN
2 NaN 1.0 1.0 1.0 1.0
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0 DF1 = Df1.append ([DF2, Df3], ignore_index=true) # You can add more than one DF at a time, or you can ignore index
A b c d E
0 0.0 0.0 0.0 0.0 NaN
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0
5 NaN 1.0 1.0 1.0 1.0
6 NaN 2.0 2.0 2.0 2.0
7 NaN 2.0 2.0 2.0 2.0
8 NaN 2.0 2.0 2.0 2.0 S1=PD. Series ([1,2,3,4],index=[' A ', ' B ', ' C ', ' d '])
DF1 = Df1.append (S1, ignore_index=true) # can be added series,ignore_index=true must be added, the number of S1 must be the same as the number of columns DF1
A B c D
0 0.0 0.0 0.0 0.0 # Note Ignore_index = True, the index of DF1 becomes starting from 0!!!
1 0.0 0.0) 0.0 0.0
2 0.0 0.0) 0.0 0.0
3 1.0 2.0 3.0 4.0 l1=[1,2,3,4]
Df1.loc[4] = L1
A B c D
1 0.0 0.0 0.0 0.0 # DF1 Index or according to the original starting from 1
2 0.0 0.0) 0.0 0.0
3 0.0 0.0) 0.0 0.0
4 1.0 2.0) 3.0 4.0 2.2,append Example
DF1 = PD. DataFrame ({' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ', ' B4 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ', ' D4 '],
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 ', ' A4 '],
' E ': [' E0 ', ' E1 ', ' E2 ', ' E3 ', ' E4 ']},
Index=[0, 1, 2, 3,4],columns=[' B ', ' D ', ' A ', ' E ']) DF2 = PD. DataFrame ({' F ': [' F4 ', ' F5 ', ' F6 ', ' F7 ', ' F8 '],
' A ': [' A4 ', ' A5 ', ' A6 ', ' A7 ', ' A8 '],
' B ': [' B4 ', ' B5 ', ' B6 ', ' B7 ', ' B8 '],
' C ': [' C4 ', ' C5 ', ' C6 ', ' C7 ', ' C8 ']},
Index=[5, 9, 6, 7,10]) after entering Df3=df1.append (DF2), DF3 becomes:
A B C D E F
0 A0 B0 nan D0 E0 nan
1 A1 B1 nan D1 E1 nan
2 A2 B2 nan D2 E2 nan
3 A3 B3 nan D3 E3 nan
4 A4 B4 nan D4 E4 nan
5 A4 B4 C4 nan nan F4
9 A5 B5 C5 nan nan F5
6 A6 B6 C6 nan nan F6
7 A7 B7 C7 nan nan F7
Ten A8 B8 C8 nan nan F8

The column items are rearranged, and if you want the column items to follow df1, that is, the order of DF1, and if the DF1 has no columns, throw them away: Df4=pd.concat ([df1,df2],axis=0)
Df4.reindex (columns=df1.columns)
3,pd.merge # Merging
3.1, 1 keys
LEFT=PD. DataFrame ({' key ': [' K0 ', ' K1 ', ' K2 ', ' K3 '),
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 '],
' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ']})
RIGHT=PD. DataFrame ({' key ': [' K0 ', ' K1 ', ' K2 ', ' K3 '),
' C ': [' C0 ', ' C1 ', ' C2 ', ' C3 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ']}) Res=pd.merge (Left,right)
Res=pd.merge (left,right,on=[' key ') # default automatically selects a common column for merging, and if no common column will be an error
A B Key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
2 A2 B2 K2 C2 D2
3 A3 B3 K3 C3 D3 3.2, multiple key,how preferable [' left ', ' right ', ' inner ', ' outer ']
LEFT=PD. DataFrame ({' Key1 ': [' K0 ', ' K0 ', ' K1 ', ' K2 '),
' Key2 ': [' K0 ', ' K1 ', ' K0 ', ' K1 '],
' A ': [' A0 ', ' A1 ', ' A2 ', ' A3 '],
' B ': [' B0 ', ' B1 ', ' B2 ', ' B3 ']})
RIGHT=PD. DataFrame ({' Key1 ': [' K0 ', ' K1 ', ' K1 ', ' K2 '),
' Key2 ': [' K0 ', ' K0 ', ' K0 ', ' K0 '],
' C ': [' C0 ', ' C1 ', ' C2 ', ' C3 '],
' D ': [' D0 ', ' D1 ', ' D2 ', ' D3 ']})
A B Key1 Key2
0 A0 B0 K0 K0
1 A1 B1 K0 K1
2 A2 B2 K1 K0
3 A3 B3 K2 K1
C D Key1 Key2
0 C0 D0 K0 K0
1 C1 D1 K1 K0
2 C2 D2 K1 K0
3 C3 D3 K2 K0 res=pd.merge (left,right,on=[' key1 ', ' Key2 ']) # need to be key1,key2 exactly the same before merging, left there is a row k1/k0,right there are two lines k1/k0, then Print 2 times
Res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' inner ') # default is how= ' inner ', that is, the value of key is the same
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2 res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' outer ') # All merges according to Key1,key2 value, if left or right is not Some take Nan
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 nan nan
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 nan nan
5 nan nan K2 K0 C3 D3 res=pd.merge (left,right,on=[' key1 ', ' key2 '], how = ' left ') # 2 rows of k1k0 are repeated as a result of the left key merge (because RI Ght has 2 rows)
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A1 B1 K0 K1 nan nan
2 A2 B2 K1 K0 C1 D1
3 A2 B2 K1 K0 C2 D2
4 A3 B3 K2 K1 nan nan res=pd.merge (left,right,on=[' key1 ', ' key2 '],how= ' right ') # Merge according to right key
A B key1 key2 C D
0 A0 B0 K0 K0 C0 D0
1 A2 B2 K1 K0 C1 D1
2 A2 B2 K1 K0 C2 D2
3 nan nan K2 K0 C3 D3 3.3, indicator
The default false,true indicates whether the merge is displayed, left or right, or both

Merger of Dataframe (Append, merge, concat)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.