我有一个数据框,已按用户和时间排序
df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'], 'location' : ['house','house','gym','gym','shop','gym','gym'], 'duration':[10,5,5,4,10,4,6]}) duration location user 0 10 house A 1 5 house A 2 5 gym A 3 4 gym B 4 10 shop B 5 4 gym B 6 6 gym B
我只想在给定用户的相邻行的"位置"字段相同时执行sum().所以它不仅仅是df.groupby(['id','location']).duration.sum().所需的输出如下所示.此外,订单很重要.
duration location user 15 house A 5 gym A 4 gym B 10 shop B 10 gym B
谢谢!
供应sort=False
以保持原始组中出现的组之间的顺序DF
.然后,计算持续时间列的分组总和.
adj_check = (df.location != df.location.shift()).cumsum() df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()
需要对之前尝试过的内容进行的唯一更改是将所有类似的连续行组合成一个唯一的组:
(df.location != df.location.shift()).cumsum() 0 1 1 1 2 2 3 2 4 3 5 4 6 4 Name: location, dtype: int32