News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

Sort each row individually between two columns

posted on 2024-12-02 22:04     read(378)     comment(0)     like(6)     collect(2)


I have the following pandas dataframe:

column_01   column_02   value  
ccc         aaa         1
bbb         ddd         34
ddd         aaa         98

I need to re-organise the dataframe such that column_01 contains which ever value comes first alphabetically between column_01 and column_02. The output of the above example would be:

column_01   column_02   value
aaa         ccc         1
bbb         ddd         34
aaa         ddd         98

I could obviously do this by iterating over the dataframe one row at a time, comparing column_01 to column_02 to see which comes first alphabetically and swapping them if necessary. The only problem with this is that the dataframe is quite big (over 1million rows), so this isn't a very efficient way to do this.

Is there a way to do this without iterating over every row individually?


solution


You can use:

df[['column_01','column_02']] = 
df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1)
print (df)
   column_01 column_02  value
0       aaa       ccc      1
1       bbb       ddd     34
2       aaa       ddd     98

Another solutions:

df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), 
                                 index=df.index, columns=['column_01','column_02'])

only with numpy array:

df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
print (df)
  column_01 column_02  value
0       aaa       ccc      1
1       bbb       ddd     34
2       aaa       ddd     98

Second solution is faster, because apply use loops:

df = pd.concat([df]*1000).reset_index(drop=True)
In [177]: %timeit df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), index=df.index, columns=['column_01','column_02'])
1000 loops, best of 3: 1.36 ms per loop

In [182]: %timeit df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
1000 loops, best of 3: 1.54 ms per loop

In [178]: %timeit df[['column_01','column_02']] = (df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1))
1 loop, best of 3: 291 ms per loop


Category of website: technical article > Q&A

Author:qs

link:http://www.pythonblackhole.com/blog/article/247229/2d82f396bb62f797e16e/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

6 0
collect article
collected

Comment content: (supports up to 255 characters)