我需要使用R按组查找变量的运行最大值.变量按组内的时间排序使用df[order(df$group, df$time),]
.
我的变量有一些NA,但我可以通过用零替换它来处理它.
这是数据框df的样子:
(df <- structure(list(var = c(5L, 2L, 3L, 4L, 0L, 3L, 6L, 4L, 8L, 4L), group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"), time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("var", "group","time"), class = "data.frame", row.names = c(NA, -10L))) # var group time # 1 5 a 1 # 2 2 a 2 # 3 3 a 3 # 4 4 a 4 # 5 0 a 5 # 6 3 b 1 # 7 6 b 2 # 8 4 b 3 # 9 8 b 4 # 10 4 b 5
我想要一个变量curMax:
var | group | time | curMax 5 a 1 5 2 a 2 5 3 a 3 5 4 a 4 5 0 a 5 5 3 b 1 3 6 b 2 6 4 b 3 6 8 b 4 8 4 b 5 8
如果您有任何想法如何在R中实现它,请告诉我.
我们可以试试data.table
.将'data.frame'转换为'data.table'(setDT(df1)
),按'group'分组,我们得到cummax
'var'并将:=
它()分配给一个新变量('curMax')
library(data.table) setDT(df1)[, curMax := cummax(var), by = group]
正如@Michael Chirico评论的那样,如果数据不是order
'时间',我们可以在'i'中做到这一点
setDT(df1)[order(time), curMax:=cummax(var), by = group]
或者 dplyr
library(dplyr) df1 %>% group_by(group) %>% mutate(curMax = cummax(var))
如果df1
是tbl_sql
显式排序可能需要,使用arrange
df1 %>% group_by(group) %>% arrange(time, .by_group=TRUE) %>% mutate(curMax = cummax(var))
要么 dbplyr::window_order
library(dbplyr) df1 %>% group_by(group) %>% window_order(time) %>% mutate(curMax = cummax(var))