posted on 2023-05-07 21:12 read(482) comment(0) like(8) collect(4)
This article does not use "columns", "rows" to describe merging. In order to be more vivid, the words "left and right", "up and down" are used
The append() function is used to add rows of other dataframes to the end of the given dataframe, that is, join up and down, and return a new dataframe object. New columns and new cells will be inserted into the original DataFrame, filled with NaN values.
df1 = pd.DataFrame({"x":[15, 25, 37, 42],
"y":[24, 38, 18, 45]})
df2 = pd.DataFrame({"x":[15, 25, 37],
"y":[24, 38, 45]})
df = df1.append(df2)
print('******************df1*******************')
print(df1)
print('******************df2*******************')
print(df2)
print('******************df*******************')
print(df)
Note: There will be a warning when using the append method, because the append method will be deprecated and removed from Panda in a future release. The official recommendation is to use the concat
replacement method
df = pd.concat([df1, df2], ignore_index=False)
df1 = pd.DataFrame({"x":[15, 25, 37, 42],
"y":[24, 38, 18, 45]})
df2 = pd.DataFrame({"x":[25, 15, 12],
"y":[47, 24, 17],
"z":[38, 12, 45]})
df = df1.append(df2)
print('******************df1*******************')
print(df1)
print('******************df2*******************')
print(df2)
print('******************df*******************')
print(df)
The join() function starts by default withindexAs a benchmark, the data with the same index is merged together, and the left and right merges are performed . No NAN completion.
df1 = pd.DataFrame({"A": ["A0", "A1", "A1"], "B": ["B0", "B1", "B2"]}, index=["K0", "K1", "K2"]) df2 = pd.DataFrame({"C": ["C1", "C2", "C3"], "D": ["D0", "D1", "D2"]}, index=["K0", "K1", "K3"]) df3 = df1.join(df2) # 以df1为基准,df2没有的索引,数据补NaN df4 = df1.join(df2, how="outer") # 以df1和df2索引的并集为基准,同样的缺少的数据补NaN df5 = df1.join(df2, how="inner") # 以df1和df2索引的交集为基准,即只筛选df1和df2相同索引的数据拼接 print("******************df1*******************") print(df1) print("******************df2*******************") print(df2) print("******************df3*******************") print(df3) print("******************df4*******************") print(df4) print("******************df5*******************") print(df5)
pandas .concat(objs, axis=0, join='outer', ignore_index=False)
can connect left and right, can connect up and down (default: axis=0) usage is similar to append
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3'], 'E': ['E0', 'E1', 'E2', 'E3'] }) df2 = pd.DataFrame({ 'A': ['A4', 'A5', 'A6', 'A7'], 'B': ['B4', 'B5', 'B6', 'B7'], 'C': ['C4', 'C5', 'C6', 'C7'], 'D': ['D4', 'D5', 'D6', 'D7'], 'F': ['F4', 'F5', 'F6', 'F7'] }) df3 = pd.concat([df1, df2], ignore_index=True) df4 = pd.concat([df1,df2], ignore_index=True, join="inner") df5 = pd.concat([df1, df2], axis=1) print("******************df1*******************") print(df1) print("******************df2*******************") print(df2) print("******************df3*******************") print(df3) print("******************df4*******************") print(df4) print("******************df5*******************") print(df5)
pd.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None)
df1 = pd.DataFrame({"key": ["one", "two", "two"],
"data1": np.arange(3)})
df2 = pd.DataFrame({"key": ["one", "three", "three"],
"data2": np.arange(3)})
df3 = pd.merge(df1, df2) # 默认以内连接合并,只保留两个df相同列中元素值相同的记录行
print("******************df1*******************")
print(df1)
print("******************df2*******************")
print(df2)
print("******************df3*******************")
print(df3)
df1 = pd.DataFrame({"key": ["one", "two", "two"], "data1": np.arange(3)}) df2 = pd.DataFrame({"key": ["one", "three", "three"], "data2": np.arange(3)}) df3 = pd.merge(df1, df2, how="left") # 以左连接的方式合并,保留左数据框df1的所有记录行,若右数据框df2中没有相应数据则用NaN填充 df4 = pd.merge(df1, df2 ,how="right") # 以右连接的方式合并,保留右数据框df2的所有记录行,若左数据框df1中没有相应数据则用NaN填充。 df5 = pd.merge(df1, df2 ,how="outer") # 以外连接的方式合并,保留左右数据框的所有记录行,缺失的数据用NaN填充 print("******************df1*******************") print(df1) print("******************df2*******************") print(df2) print("******************df3*******************") print(df3) print("******************df4*******************") print(df4) print("******************df5*******************") print(df5)
The above situation is that the same column name exists in both dataframes. When the same column name does not exist in the two DataFrames, but you want to perform two operations based on column A in the first DataFrame and column B in the second DataFrame When DataFrame is merged (combined query), it can be realized by using left_on and right_on parameters.
df1 = pd.DataFrame({'key1':['X','Y','Z'], 'value1':[1,2,3]})
df2 = pd.DataFrame({'key2':['A','B','Z'], 'value2':[4,5,6]})
df3 = pd.merge(df1, df2, left_on='key1', right_on='key2')
print("******************df1*******************")
print(df1)
print("******************df2*******************")
print(df2)
print("******************df3*******************")
print(df3)
left = pd.DataFrame({'sno': [11, 12, 13, 14],
'name': ['name_a', 'name_b', 'name_c', 'name_d']
})
right = pd.DataFrame({'sno': [11, 12, 13, 14],
'age': ['21', '22', '23', '24']
})
df = pd.merge(left, right, on='sno')
print("******************left*******************")
print(left)
print("******************right*******************")
print(right)
print("******************df*******************")
print(df)
left = pd.DataFrame({'sno': [11, 12, 13, 14],
'name': ['name_a', 'name_b', 'name_c', 'name_d']
})
right = pd.DataFrame({'sno': [11, 11, 11, 12, 12, 13],
'grade': ['语文88', '数学90', '英语75','语文66', '数学55', '英语29']
})
df = pd.merge(left, right, on='sno') # 数目以多的一边为准, 结果数量会出现乘法
print("******************left*******************")
print(left)
print("******************right*******************")
print(right)
print("******************df*******************")
print(df)
left = pd.DataFrame({'sno': [11, 11, 12, 12,12],
'爱好': ['篮球', '羽毛球', '乒乓球', '篮球', "足球"]
})
right = pd.DataFrame({'sno': [11, 11, 11, 12, 12, 13],
'grade': ['语文88', '数学90', '英语75','语文66', '数学55', '英语29']
})
df = pd.merge(left, right, on='sno')
print("******************left*******************")
print(left)
print("******************right*******************")
print(right)
print("******************df*******************")
print(df)
Author:cindy
link:http://www.pythonblackhole.com/blog/article/376/21e5171b3c6c8102e755/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!