News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

Find the index of certain values in a data frame and put it as a separate column

posted on 2024-12-02 22:06     read(136)     comment(0)     like(18)     collect(5)


In the following data frame DF, Users have different values for Movies and Exist columns. For example, user 2 has 10 values and User 5 has 9 values. I want the position of the first 'True' value for Exist column (relative to the user vector length) divided to the user vector length to be put in a separate data frame along with the User ID: Imagine this is the data frame:

    User    Movie       Exist
0   2       172         False
1   2       2717        False
2   2       150         False
3   2       2700        False
4   2       2699        True
5   2       2616        False
6   2       112         False
7   2       2571        True
8   2       2657        True
9   2       2561        False
10  5       3471        False
11  5       187         False
12  5       2985        False
13  5       3388        False
14  5       3418        False
15  5       32          False
16  5       1673        False
17  5       3740        True
18  5       1693        False

So the target data frame should look like this:

5/10 =0.5
8/9= 0.88


User  Location
 2      0.5
 5      0.88

As the first True value for user 2 is in the relative index 5 (5th value in user 2 vector) and the first True value for user 5 is in index 8 (8th value in the user 5 vector). Note that, I don't want the real index which are 4 and 17.


solution


Option 1

def first_ratio(x):
    x = x.reset_index(drop=True)
    i = x.any() * (x.idxmax() + 1.)
    l = len(x)
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

User
2    0.500000
5    0.888889
Name: Exist, dtype: float64

Option 2

def first_ratio(x):
    v = x.values
    i = v.any() * (v.argmax() + 1.)
    l = v.shape[0]
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

Timing

enter image description here



Category of website: technical article > Q&A

Author:qs

link:http://www.pythonblackhole.com/blog/article/247233/ca4bfc39d50da8bd103e/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

18 0
collect article
collected

Comment content: (supports up to 255 characters)