当前位置:  开发笔记 > 数据库 > 正文

dplyr - 在group_by之后右连接,不产生期望/预期的结果

如何解决《dplyr-在group_by之后右连接,不产生期望/预期的结果》经验,为你挑选了2个好方法。

我试图让我的每个id /年/月行都拥有与所有七个工作日相对应的所有行,以及"缺少工作日"的NA.

这是数据框架和我尝试完成此任务:

> df
  id year month weekday  amount
1  1 2015     1  Friday 3650.43
2  2 2015     1  Monday 1271.12
3  1 2015     2  Friday 1315.79
4  2 2015     2  Monday 2195.37
> wday
    weekday
1    Friday
2  Saturday
3 Wednesday
4    Sunday
5   Tuesday
6    Monday
7  Thursday

试图使用group_by()和右连接.但是,它并没有产生我认为的那样.有没有一种简单的方法来实现我追求的结果?

> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]

     id  year month   weekday  amount
  (dbl) (int) (int)     (chr)   (dbl)
1     1  2015     1    Friday 3650.43
2     1  2015     2    Friday 1315.79
3    NA    NA    NA  Saturday      NA
4    NA    NA    NA Wednesday      NA
5    NA    NA    NA    Sunday      NA
6    NA    NA    NA   Tuesday      NA
7     2  2015     1    Monday 1271.12
8     2  2015     2    Monday 2195.37
9    NA    NA    NA  Thursday      NA

我希望每个id /年/月组合7行,其中缺少工作日的数量将是NA(或理想的零,但我知道如何通过mutate()得到它).

生成的数据框应如下所示:

> df
   id year month   weekday  amount
1   1 2015     1    Friday 3650.43
2   1 2015     1    Monday    0.00
3   1 2015     1  Saturday    0.00
4   1 2015     1    Sunday    0.00
5   1 2015     1  Thursday    0.00
6   1 2015     1   Tuesday    0.00
7   1 2015     1 Wednesday    0.00
8   1 2015     2    Friday 1315.79
9   1 2015     2    Monday    0.00
10  1 2015     2  Saturday    0.00
11  1 2015     2    Sunday    0.00
12  1 2015     2  Thursday    0.00
13  1 2015     2   Tuesday    0.00
14  1 2015     2 Wednesday    0.00
15  2 2015     1    Friday    0.00
16  2 2015     1    Monday 1271.12
17  2 2015     1  Saturday    0.00
18  2 2015     1    Sunday    0.00
19  2 2015     1  Thursday    0.00
20  2 2015     1   Tuesday    0.00
21  2 2015     1 Wednesday    0.00
22  2 2015     2    Friday    0.00
23  2 2015     2    Monday 2195.37
24  2 2015     2  Saturday    0.00
25  2 2015     2    Sunday    0.00
26  2 2015     2  Thursday    0.00
27  2 2015     2   Tuesday    0.00
28  2 2015     2 Wednesday    0.00

akrun.. 11

我们可以用 expand.grid

expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% 
       left_join(., df) %>%
       mutate(amount=replace(amount, is.na(amount), 0)) %>% 
       arrange(id, year, month, weekday)
#    id year month   weekday  amount
#1   1 2015     1    Friday 3650.43
#2   1 2015     1    Monday    0.00
#3   1 2015     1  Saturday    0.00
#4   1 2015     1    Sunday    0.00
#5   1 2015     1  Thursday    0.00
#6   1 2015     1   Tuesday    0.00
#7   1 2015     1 Wednesday    0.00
#8   1 2015     2    Friday 1315.79
#9   1 2015     2    Monday    0.00
#10  1 2015     2  Saturday    0.00
#11  1 2015     2    Sunday    0.00
#12  1 2015     2  Thursday    0.00
#13  1 2015     2   Tuesday    0.00
#14  1 2015     2 Wednesday    0.00
#15  2 2015     1    Friday    0.00
#16  2 2015     1    Monday 1271.12
#17  2 2015     1  Saturday    0.00
#18  2 2015     1    Sunday    0.00
#19  2 2015     1  Thursday    0.00
#20  2 2015     1   Tuesday    0.00
#21  2 2015     1 Wednesday    0.00
#22  2 2015     2    Friday    0.00
#23  2 2015     2    Monday 2195.37
#24  2 2015     2  Saturday    0.00
#25  2 2015     2    Sunday    0.00
#26  2 2015     2  Thursday    0.00
#27  2 2015     2   Tuesday    0.00
#28  2 2015     2 Wednesday    0.00


G. Grothendi.. 6

sqldf对于复杂的连接,通常更容易使用SQL:

library(sqldf)
sqldf("select 
         id, 
         year, 
         month, 
         wday.weekday, 
         sum((df.weekday = wday.weekday) * amount) amount 
       from df 
       join wday
       group by 1, 2, 3, 4")

赠送:

   id year month   weekday  amount
1   1 2015     1    Friday 3650.43
2   1 2015     1  Saturday    0.00
3   1 2015     1 Wednesday    0.00
4   1 2015     1    Sunday    0.00
5   1 2015     1   Tuesday    0.00
6   1 2015     1    Monday    0.00
7   1 2015     1  Thursday    0.00
8   2 2015     1    Friday    0.00
9   2 2015     1  Saturday    0.00
10  2 2015     1 Wednesday    0.00
11  2 2015     1    Sunday    0.00
12  2 2015     1   Tuesday    0.00
13  2 2015     1    Monday 1271.12
14  2 2015     1  Thursday    0.00
15  1 2015     2    Friday 1315.79
16  1 2015     2  Saturday    0.00
17  1 2015     2 Wednesday    0.00
18  1 2015     2    Sunday    0.00
19  1 2015     2   Tuesday    0.00
20  1 2015     2    Monday    0.00
21  1 2015     2  Thursday    0.00
22  2 2015     2    Friday    0.00
23  2 2015     2  Saturday    0.00
24  2 2015     2 Wednesday    0.00
25  2 2015     2    Sunday    0.00
26  2 2015     2   Tuesday    0.00
27  2 2015     2    Monday 2195.37
28  2 2015     2  Thursday    0.00

base R 我们可以使用merge和在基础R中复制它transform:

xt <- transform(
  merge(df, wday, by = c()),
  amount = (as.character(weekday.x) == as.character(weekday.y)) * amount, 
  weekday = weekday.y, 
  weekday.x = NULL, 
  weekday.y = NULL
))
aggregate(amount ~., xt, sum)

dplyr 如果我们真的想使用dplyr,我们可以替换transformwith mutate,rename并且select:

library(dplyr)
merge(df, wday, by = c()) %>% 
 mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>%
 rename(weekday = weekday.y) %>%
 select(-weekday.x) %>%
 group_by(id, year, month, weekday) %>%
 summarise(amount = sum(amount))

注意: 如果每个组只有一个工作日(如问题中所示),我们可以选择在三个解决方案中分别省略group by/sum,aggregate和group_by/summarize.



1> akrun..:

我们可以用 expand.grid

expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>% 
       left_join(., df) %>%
       mutate(amount=replace(amount, is.na(amount), 0)) %>% 
       arrange(id, year, month, weekday)
#    id year month   weekday  amount
#1   1 2015     1    Friday 3650.43
#2   1 2015     1    Monday    0.00
#3   1 2015     1  Saturday    0.00
#4   1 2015     1    Sunday    0.00
#5   1 2015     1  Thursday    0.00
#6   1 2015     1   Tuesday    0.00
#7   1 2015     1 Wednesday    0.00
#8   1 2015     2    Friday 1315.79
#9   1 2015     2    Monday    0.00
#10  1 2015     2  Saturday    0.00
#11  1 2015     2    Sunday    0.00
#12  1 2015     2  Thursday    0.00
#13  1 2015     2   Tuesday    0.00
#14  1 2015     2 Wednesday    0.00
#15  2 2015     1    Friday    0.00
#16  2 2015     1    Monday 1271.12
#17  2 2015     1  Saturday    0.00
#18  2 2015     1    Sunday    0.00
#19  2 2015     1  Thursday    0.00
#20  2 2015     1   Tuesday    0.00
#21  2 2015     1 Wednesday    0.00
#22  2 2015     2    Friday    0.00
#23  2 2015     2    Monday 2195.37
#24  2 2015     2  Saturday    0.00
#25  2 2015     2    Sunday    0.00
#26  2 2015     2  Thursday    0.00
#27  2 2015     2   Tuesday    0.00
#28  2 2015     2 Wednesday    0.00



2> G. Grothendi..:

sqldf对于复杂的连接,通常更容易使用SQL:

library(sqldf)
sqldf("select 
         id, 
         year, 
         month, 
         wday.weekday, 
         sum((df.weekday = wday.weekday) * amount) amount 
       from df 
       join wday
       group by 1, 2, 3, 4")

赠送:

   id year month   weekday  amount
1   1 2015     1    Friday 3650.43
2   1 2015     1  Saturday    0.00
3   1 2015     1 Wednesday    0.00
4   1 2015     1    Sunday    0.00
5   1 2015     1   Tuesday    0.00
6   1 2015     1    Monday    0.00
7   1 2015     1  Thursday    0.00
8   2 2015     1    Friday    0.00
9   2 2015     1  Saturday    0.00
10  2 2015     1 Wednesday    0.00
11  2 2015     1    Sunday    0.00
12  2 2015     1   Tuesday    0.00
13  2 2015     1    Monday 1271.12
14  2 2015     1  Thursday    0.00
15  1 2015     2    Friday 1315.79
16  1 2015     2  Saturday    0.00
17  1 2015     2 Wednesday    0.00
18  1 2015     2    Sunday    0.00
19  1 2015     2   Tuesday    0.00
20  1 2015     2    Monday    0.00
21  1 2015     2  Thursday    0.00
22  2 2015     2    Friday    0.00
23  2 2015     2  Saturday    0.00
24  2 2015     2 Wednesday    0.00
25  2 2015     2    Sunday    0.00
26  2 2015     2   Tuesday    0.00
27  2 2015     2    Monday 2195.37
28  2 2015     2  Thursday    0.00

base R 我们可以使用merge和在基础R中复制它transform:

xt <- transform(
  merge(df, wday, by = c()),
  amount = (as.character(weekday.x) == as.character(weekday.y)) * amount, 
  weekday = weekday.y, 
  weekday.x = NULL, 
  weekday.y = NULL
))
aggregate(amount ~., xt, sum)

dplyr 如果我们真的想使用dplyr,我们可以替换transformwith mutate,rename并且select:

library(dplyr)
merge(df, wday, by = c()) %>% 
 mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>%
 rename(weekday = weekday.y) %>%
 select(-weekday.x) %>%
 group_by(id, year, month, weekday) %>%
 summarise(amount = sum(amount))

注意: 如果每个组只有一个工作日(如问题中所示),我们可以选择在三个解决方案中分别省略group by/sum,aggregate和group_by/summarize.

推荐阅读
mobiledu2402852357
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有