我的数据如下所示:
id, date, target 1,2016-10-24,22 1,2016-10-25,31 1,2016-10-27,44 1,2016-10-28,12 2,2016-10-21,22 2,2016-10-22,31 2,2016-10-25,44 2,2016-10-27,12
我想在id中填写缺少的日期.例如,id = 1的日期范围是2016-10-24~2016-10-28,缺少2016-10-26.此外,id = 2的日期范围是2016-10-21~2016-10-27,2016-10-23,2016-10-24和2016-10-26都不见了.我想填写缺少的日期并将目标值填写为0.
因此,我希望我的数据如下:
id, date, target 1,2016-10-24,22 1,2016-10-25,31 1,2016-10-26,0 1,2016-10-27,44 1,2016-10-28,12 2,2016-10-21,22 2,2016-10-22,31 2,2016-10-23,0 2,2016-10-24,0 2,2016-10-25,44 2,2016-10-26,0 2,2016-10-27,12
有人能帮助我吗?
提前致谢.
您可以使用groupby
带有resample
-然后是问题fillna
-因此需要asfreq
先:
#if necessary convert to datetime df.date = pd.to_datetime(df.date) df = df.set_index('date') df = df.groupby('id').resample('d')['target'].asfreq().fillna(0).astype(int).reset_index() print (df) id date target 0 1 2016-10-24 22 1 1 2016-10-25 31 2 1 2016-10-26 0 3 1 2016-10-27 44 4 1 2016-10-28 12 5 2 2016-10-21 22 6 2 2016-10-22 31 7 2 2016-10-23 0 8 2 2016-10-24 0 9 2 2016-10-25 44 10 2 2016-10-26 0 11 2 2016-10-27 12