How to conditionally separate a cell value and add to a column using pandas-python black hole net

News from this site

Rental advertising space, please contact the webmaster if you need cooperation

244893

article

122578561

browse

+focus

classification

no classification

date

no datas

How to conditionally separate a cell value and add to a column using pandas

posted on 2024-11-07 20:02 read(545) comment(0) like(19) collect(4)

For example

testing.csv:

First Name    Last Name  Profile URL
Ashleigh      Phelps     https://www.linkedin.com/in/ashleighephelps
Jonathan                 https://www.linkedin.com/in/jonathantsegal
Camilla Innes            https://www.linkedin.com/in/camilla-innes-61213628  
Rachel                   https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael                  https://www.linkedin.com/in/mikeitalia
Antonio                  https://www.linkedin.com/in/antoniomolinelli
Lauren        Zsigray    https://www.linkedin.com/in/lauren-zsigray-13b5aa25

The code I have used will only separate which has a hyphen but how to get the last name which is with the first name?

df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df = df[df['Last Name'].isnull()]
p = df.pop('Profile URL')
tmp_df = p.str.split('/')
df['Last Name'] = tmp_df.str[-1]
tmp1_df = df.pop('Last Name').str.split('-')
df['Last Name'] = tmp1_df.str[1:-1].str.join(sep='-')
df = pd.concat([df, p], axis=1)
print (df)

Which gives this output:

First Name  Last Name       Profile URL
Ashleigh    Phelps          https://www.linkedin.com/in/ashleighephelps
Jonathan                    https://www.linkedin.com/in/jonathantsegal
Camilla     Innes           https://www.linkedin.com/in/camilla-innes-61213628
Rachel      hudesman        https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael                     https://www.linkedin.com/in/mikeitalia
Antonio                     https://www.linkedin.com/in/antoniomolinelli
Lauren      Zsigray         https://www.linkedin.com/in/lauren-zsigray-13b5aa25

Expected output:

First Name  Last Name       Profile URL
Ashleigh    Phelps          https://www.linkedin.com/in/ashleighephelps
Jonathan    tsegal          https://www.linkedin.com/in/jonathantsegal
Camilla     Innes           https://www.linkedin.com/in/camilla-innes-13628
Rachel      hudesman        https://www.linkedin.com/in/rachel-hudesman-33
Michael                     https://www.linkedin.com/in/mikeitalia
Antonio     molinelli       https://www.linkedin.com/in/antoniomolinelli
Lauren      Zsigray         https://www.linkedin.com/in/lauren-zsigray-13b5a

What should be used to get the output in this format

solution

Try this piece of code:

import pandas as pd

df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")

df.fillna('', inplace=True)

def clear_data(x):
    fname = x['First Name']
    lname = x['Last Name'].strip()
    url = x['Profile URL']
    if not lname:
        fname = fname.split(' ')[0]
        url_name = url.split('/')[-1].split('-')
        if len(url_name) > 1:
            lname = url_name[-2].title()
        else:
            index_of_fname = url_name[0].lower().find(fname.lower())
            if index_of_fname != -1:
                index_of_fname += len(fname)
                lname = url_name[0][index_of_fname:].title()

        x['First Name'] = fname
        x['Last Name'] = lname
    else:
        lname = lname.split('-')[0].strip()
        x['Last Name'] = lname

    return x


df.apply(clear_data, axis=1)

print(df)