no classification
no tag
no datas
posted on 2024-11-07 20:02 read(545) comment(0) like(19) collect(4)
For example
testing.csv:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-61213628
Rachel https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael https://www.linkedin.com/in/mikeitalia
Antonio https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5aa25
The code I have used will only separate which has a hyphen but how to get the last name which is with the first name?
df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df = df[df['Last Name'].isnull()]
p = df.pop('Profile URL')
tmp_df = p.str.split('/')
df['Last Name'] = tmp_df.str[-1]
tmp1_df = df.pop('Last Name').str.split('-')
df['Last Name'] = tmp1_df.str[1:-1].str.join(sep='-')
df = pd.concat([df, p], axis=1)
print (df)
Which gives this output:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-61213628
Rachel hudesman https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael https://www.linkedin.com/in/mikeitalia
Antonio https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5aa25
Expected output:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan tsegal https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-13628
Rachel hudesman https://www.linkedin.com/in/rachel-hudesman-33
Michael https://www.linkedin.com/in/mikeitalia
Antonio molinelli https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5a
What should be used to get the output in this format
Try this piece of code:
import pandas as pd
df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df.fillna('', inplace=True)
def clear_data(x):
fname = x['First Name']
lname = x['Last Name'].strip()
url = x['Profile URL']
if not lname:
fname = fname.split(' ')[0]
url_name = url.split('/')[-1].split('-')
if len(url_name) > 1:
lname = url_name[-2].title()
else:
index_of_fname = url_name[0].lower().find(fname.lower())
if index_of_fname != -1:
index_of_fname += len(fname)
lname = url_name[0][index_of_fname:].title()
x['First Name'] = fname
x['Last Name'] = lname
else:
lname = lname.split('-')[0].strip()
x['Last Name'] = lname
return x
df.apply(clear_data, axis=1)
print(df)
Author:qs
link:http://www.pythonblackhole.com/blog/article/246856/cd3c71a6a69658d78746/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!