News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

Is there a way to find patterns in a column in a Pandas DataFrame

posted on 2023-11-12 14:30     read(1140)     comment(0)     like(3)     collect(4)


I have Pandas DataFrame of missing dataframes from a larger dataset. The column web_id contains the ids that were missing from the larger dataframe.

I am trying to find a pattern in the way they were missing from the larger dataset.

For example, the following code is reproducible on your local computer. The below is a sample of my dataset at the moment:

pd.DataFrame({
"web_id": [43291, 43300, 43313, 43316, 43335, 43345, 43346, 43353, 43361, 43373, 43383, 43387, 43416],
"date": "12/17/2019"
})

I believe there is some sort of patterns in the missingness. How can I find the sequence of the web_id to further understand how the data were missing from the larger dataset?

Many thanks in advance


solution


x = pd.DataFrame({ "web_id": [43291, 43300, 43313, 43316, 43335,
43345, 43346, 43353, 43361, 43373, 43383, 43387, 43416], })

ls = [] 
for i in x.values:  
   for j in i:        
       ls.append(j)

for i in range(len(ls)-1):    
    print(ls[i+1] - ls[i])

This will print out the difference between each value in the columns. I did not notice any mathematical sequence, at least using this difference method.

output: 9, 13, 3, 19, 10, 1, 7, 8, 12, 10, 4, 29

Actually, you can go here https://oeis.org/ to verify if the sequence has been found before. It doesn't seem so. Good luck!



Category of website: technical article > Q&A

Author:qs

link:http://www.pythonblackhole.com/blog/article/245164/bb7582cbe6fcd7a5e19c/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

3 0
collect article
collected

Comment content: (supports up to 255 characters)