我试图让我的每个id /年/月行都拥有与所有七个工作日相对应的所有行,以及"缺少工作日"的NA.
这是数据框架和我尝试完成此任务:
> df id year month weekday amount 1 1 2015 1 Friday 3650.43 2 2 2015 1 Monday 1271.12 3 1 2015 2 Friday 1315.79 4 2 2015 2 Monday 2195.37 > wday weekday 1 Friday 2 Saturday 3 Wednesday 4 Sunday 5 Tuesday 6 Monday 7 Thursday
试图使用group_by()和右连接.但是,它并没有产生我认为的那样.有没有一种简单的方法来实现我追求的结果?
> df <- df %>% group_by(id, year, month) %>% right_join(wday) Joining by: "weekday" > df Source: local data frame [9 x 5] Groups: id, year, month [?] id year month weekday amount (dbl) (int) (int) (chr) (dbl) 1 1 2015 1 Friday 3650.43 2 1 2015 2 Friday 1315.79 3 NA NA NA Saturday NA 4 NA NA NA Wednesday NA 5 NA NA NA Sunday NA 6 NA NA NA Tuesday NA 7 2 2015 1 Monday 1271.12 8 2 2015 2 Monday 2195.37 9 NA NA NA Thursday NA
我希望每个id /年/月组合7行,其中缺少工作日的数量将是NA(或理想的零,但我知道如何通过mutate()得到它).
生成的数据框应如下所示:
> df id year month weekday amount 1 1 2015 1 Friday 3650.43 2 1 2015 1 Monday 0.00 3 1 2015 1 Saturday 0.00 4 1 2015 1 Sunday 0.00 5 1 2015 1 Thursday 0.00 6 1 2015 1 Tuesday 0.00 7 1 2015 1 Wednesday 0.00 8 1 2015 2 Friday 1315.79 9 1 2015 2 Monday 0.00 10 1 2015 2 Saturday 0.00 11 1 2015 2 Sunday 0.00 12 1 2015 2 Thursday 0.00 13 1 2015 2 Tuesday 0.00 14 1 2015 2 Wednesday 0.00 15 2 2015 1 Friday 0.00 16 2 2015 1 Monday 1271.12 17 2 2015 1 Saturday 0.00 18 2 2015 1 Sunday 0.00 19 2 2015 1 Thursday 0.00 20 2 2015 1 Tuesday 0.00 21 2 2015 1 Wednesday 0.00 22 2 2015 2 Friday 0.00 23 2 2015 2 Monday 2195.37 24 2 2015 2 Saturday 0.00 25 2 2015 2 Sunday 0.00 26 2 2015 2 Thursday 0.00 27 2 2015 2 Tuesday 0.00 28 2 2015 2 Wednesday 0.00
akrun.. 11
我们可以用 expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% left_join(., df) %>% mutate(amount=replace(amount, is.na(amount), 0)) %>% arrange(id, year, month, weekday) # id year month weekday amount #1 1 2015 1 Friday 3650.43 #2 1 2015 1 Monday 0.00 #3 1 2015 1 Saturday 0.00 #4 1 2015 1 Sunday 0.00 #5 1 2015 1 Thursday 0.00 #6 1 2015 1 Tuesday 0.00 #7 1 2015 1 Wednesday 0.00 #8 1 2015 2 Friday 1315.79 #9 1 2015 2 Monday 0.00 #10 1 2015 2 Saturday 0.00 #11 1 2015 2 Sunday 0.00 #12 1 2015 2 Thursday 0.00 #13 1 2015 2 Tuesday 0.00 #14 1 2015 2 Wednesday 0.00 #15 2 2015 1 Friday 0.00 #16 2 2015 1 Monday 1271.12 #17 2 2015 1 Saturday 0.00 #18 2 2015 1 Sunday 0.00 #19 2 2015 1 Thursday 0.00 #20 2 2015 1 Tuesday 0.00 #21 2 2015 1 Wednesday 0.00 #22 2 2015 2 Friday 0.00 #23 2 2015 2 Monday 2195.37 #24 2 2015 2 Saturday 0.00 #25 2 2015 2 Sunday 0.00 #26 2 2015 2 Thursday 0.00 #27 2 2015 2 Tuesday 0.00 #28 2 2015 2 Wednesday 0.00
G. Grothendi.. 6
sqldf对于复杂的连接,通常更容易使用SQL:
library(sqldf) sqldf("select id, year, month, wday.weekday, sum((df.weekday = wday.weekday) * amount) amount from df join wday group by 1, 2, 3, 4")
赠送:
id year month weekday amount 1 1 2015 1 Friday 3650.43 2 1 2015 1 Saturday 0.00 3 1 2015 1 Wednesday 0.00 4 1 2015 1 Sunday 0.00 5 1 2015 1 Tuesday 0.00 6 1 2015 1 Monday 0.00 7 1 2015 1 Thursday 0.00 8 2 2015 1 Friday 0.00 9 2 2015 1 Saturday 0.00 10 2 2015 1 Wednesday 0.00 11 2 2015 1 Sunday 0.00 12 2 2015 1 Tuesday 0.00 13 2 2015 1 Monday 1271.12 14 2 2015 1 Thursday 0.00 15 1 2015 2 Friday 1315.79 16 1 2015 2 Saturday 0.00 17 1 2015 2 Wednesday 0.00 18 1 2015 2 Sunday 0.00 19 1 2015 2 Tuesday 0.00 20 1 2015 2 Monday 0.00 21 1 2015 2 Thursday 0.00 22 2 2015 2 Friday 0.00 23 2 2015 2 Saturday 0.00 24 2 2015 2 Wednesday 0.00 25 2 2015 2 Sunday 0.00 26 2 2015 2 Tuesday 0.00 27 2 2015 2 Monday 2195.37 28 2 2015 2 Thursday 0.00
base R 我们可以使用merge
和在基础R中复制它transform
:
xt <- transform( merge(df, wday, by = c()), amount = (as.character(weekday.x) == as.character(weekday.y)) * amount, weekday = weekday.y, weekday.x = NULL, weekday.y = NULL )) aggregate(amount ~., xt, sum)
dplyr 如果我们真的想使用dplyr,我们可以替换transform
with mutate
,rename
并且select
:
library(dplyr) merge(df, wday, by = c()) %>% mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>% rename(weekday = weekday.y) %>% select(-weekday.x) %>% group_by(id, year, month, weekday) %>% summarise(amount = sum(amount))
注意: 如果每个组只有一个工作日(如问题中所示),我们可以选择在三个解决方案中分别省略group by/sum,aggregate和group_by/summarize.
我们可以用 expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% left_join(., df) %>% mutate(amount=replace(amount, is.na(amount), 0)) %>% arrange(id, year, month, weekday) # id year month weekday amount #1 1 2015 1 Friday 3650.43 #2 1 2015 1 Monday 0.00 #3 1 2015 1 Saturday 0.00 #4 1 2015 1 Sunday 0.00 #5 1 2015 1 Thursday 0.00 #6 1 2015 1 Tuesday 0.00 #7 1 2015 1 Wednesday 0.00 #8 1 2015 2 Friday 1315.79 #9 1 2015 2 Monday 0.00 #10 1 2015 2 Saturday 0.00 #11 1 2015 2 Sunday 0.00 #12 1 2015 2 Thursday 0.00 #13 1 2015 2 Tuesday 0.00 #14 1 2015 2 Wednesday 0.00 #15 2 2015 1 Friday 0.00 #16 2 2015 1 Monday 1271.12 #17 2 2015 1 Saturday 0.00 #18 2 2015 1 Sunday 0.00 #19 2 2015 1 Thursday 0.00 #20 2 2015 1 Tuesday 0.00 #21 2 2015 1 Wednesday 0.00 #22 2 2015 2 Friday 0.00 #23 2 2015 2 Monday 2195.37 #24 2 2015 2 Saturday 0.00 #25 2 2015 2 Sunday 0.00 #26 2 2015 2 Thursday 0.00 #27 2 2015 2 Tuesday 0.00 #28 2 2015 2 Wednesday 0.00
sqldf对于复杂的连接,通常更容易使用SQL:
library(sqldf) sqldf("select id, year, month, wday.weekday, sum((df.weekday = wday.weekday) * amount) amount from df join wday group by 1, 2, 3, 4")
赠送:
id year month weekday amount 1 1 2015 1 Friday 3650.43 2 1 2015 1 Saturday 0.00 3 1 2015 1 Wednesday 0.00 4 1 2015 1 Sunday 0.00 5 1 2015 1 Tuesday 0.00 6 1 2015 1 Monday 0.00 7 1 2015 1 Thursday 0.00 8 2 2015 1 Friday 0.00 9 2 2015 1 Saturday 0.00 10 2 2015 1 Wednesday 0.00 11 2 2015 1 Sunday 0.00 12 2 2015 1 Tuesday 0.00 13 2 2015 1 Monday 1271.12 14 2 2015 1 Thursday 0.00 15 1 2015 2 Friday 1315.79 16 1 2015 2 Saturday 0.00 17 1 2015 2 Wednesday 0.00 18 1 2015 2 Sunday 0.00 19 1 2015 2 Tuesday 0.00 20 1 2015 2 Monday 0.00 21 1 2015 2 Thursday 0.00 22 2 2015 2 Friday 0.00 23 2 2015 2 Saturday 0.00 24 2 2015 2 Wednesday 0.00 25 2 2015 2 Sunday 0.00 26 2 2015 2 Tuesday 0.00 27 2 2015 2 Monday 2195.37 28 2 2015 2 Thursday 0.00
base R 我们可以使用merge
和在基础R中复制它transform
:
xt <- transform( merge(df, wday, by = c()), amount = (as.character(weekday.x) == as.character(weekday.y)) * amount, weekday = weekday.y, weekday.x = NULL, weekday.y = NULL )) aggregate(amount ~., xt, sum)
dplyr 如果我们真的想使用dplyr,我们可以替换transform
with mutate
,rename
并且select
:
library(dplyr) merge(df, wday, by = c()) %>% mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>% rename(weekday = weekday.y) %>% select(-weekday.x) %>% group_by(id, year, month, weekday) %>% summarise(amount = sum(amount))
注意: 如果每个组只有一个工作日(如问题中所示),我们可以选择在三个解决方案中分别省略group by/sum,aggregate和group_by/summarize.