这是对我之前发布的问题的调整.这是我的数据:
set.seed(3737) DF2 = data.frame(user_id = c(rep(27, 7), rep(11, 7)), date = as.Date(rep(c('2016-01-01', '2016-01-03', '2016-01-05', '2016-01-07', '2016-01-10', '2016-01-14', '2016-01-16'), 2)), value = round(rnorm(14, 15, 5), 1)) user_id date value 27 2016-01-01 15.0 27 2016-01-03 22.4 27 2016-01-05 13.3 27 2016-01-07 21.9 27 2016-01-10 20.6 27 2016-01-14 18.6 27 2016-01-16 16.4 11 2016-01-01 6.8 11 2016-01-03 21.3 11 2016-01-05 19.8 11 2016-01-07 22.0 11 2016-01-10 19.4 11 2016-01-14 17.5 11 2016-01-16 19.3
这一次,我想计算指定时间段内value
每个的累积总和user_id
'; 例如,持续7天,14天.理想的解决方案如下所示:
user_id date value v_minus7 v_minus14 27 2016-01-01 15.0 15.0 15.0 27 2016-01-03 22.4 37.4 37.4 27 2016-01-05 13.3 50.7 50.7 27 2016-01-07 21.9 72.6 72.6 27 2016-01-10 20.6 78.2 93.2 27 2016-01-14 18.6 61.1 111.8 27 2016-01-16 16.4 55.6 113.2 11 2016-01-01 6.8 6.8 6.8 11 2016-01-03 21.3 28.1 28.1 11 2016-01-05 19.8 47.9 47.9 11 2016-01-07 22.0 69.9 69.9 11 2016-01-10 19.4 82.5 89.3 11 2016-01-14 17.5 58.9 106.8 11 2016-01-16 19.3 56.2 119.3
理想情况下,我想用dplyr
它,但其他包也没关系.
逻辑:第一组user_id
,然后是date
.现在,对于每个数据子集,我们检查当前日期和7/14天之间的所有日期,使用between()
哪个日期返回逻辑向量.
基于此逻辑向量,我添加了value
列
library(data.table) setDT(DF2)[, `:=`(v_minus7 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-7, date, incbounds = TRUE)]), v_minus14 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-14, date, incbounds = TRUE)])), by = c("user_id", "date")][] # user_id date value v_minus7 v_minus14 #1: 27 2016-01-01 15.0 15.0 15.0 #2: 27 2016-01-03 22.4 37.4 37.4 #3: 27 2016-01-05 13.3 50.7 50.7 #4: 27 2016-01-07 21.9 72.6 72.6 #5: 27 2016-01-10 20.6 78.2 93.2 #6: 27 2016-01-14 18.6 61.1 111.8 #7: 27 2016-01-16 16.4 55.6 113.2 #8: 11 2016-01-01 6.8 6.8 6.8 #9: 11 2016-01-03 21.3 28.1 28.1 #10: 11 2016-01-05 19.8 47.9 47.9 #11: 11 2016-01-07 22.0 69.9 69.9 #12: 11 2016-01-10 19.4 82.5 89.3 #13: 11 2016-01-14 17.5 58.9 106.8 #14: 11 2016-01-16 19.3 56.2 119.3
# from alexis_laz answer. ff = function(date, value, minus){ cs = cumsum(value) i = findInterval(date - minus, date, rightmost.closed = TRUE) w = which(as.logical(i)) i[w] = cs[i[w]] cs - i } setDT(DF2) DF2[, `:=`( v_minus7 = ff(date, value, 7), v_minus14 = ff(date, value, 14)), by = c("user_id")]