我正在尝试组合一个函数,从我的原始数据框创建一个子集,然后使用dplyr的SELECT和MUTATE根据萼片/花瓣的宽度和长度的总和给出我的大/小条目的数量.
filter <- function (spp, LENGTH, WIDTH) { d <- subset (iris, subset=iris$Species == spp) # This part seems to work just fine large <- d %>% select (LENGTH, WIDTH) %>% # This is where the problem arises. mutate (sum = LENGTH + WIDTH) big_samples <- which(large$sum > 4) return (length(big_samples)) }
基本上,我希望函数返回大花的数量.但是,当我运行该函数时,我得到以下错误 -
filter("virginica", "Sepal.Length", "Sepal.Width") Error: All select() inputs must resolve to integer column positions. The following do not: * LENGTH * WIDTH
我究竟做错了什么?
您遇到了NSE/SE问题,请参阅插图以获取更多信息.
简而言之,dplyr
使用名称的非标准评估(NSE),并将列的名称传递给函数会破坏它,而不使用标准评估(SE)版本.
SE版本的dplyr
函数以_结尾.你可以看到它select_
与原始参数很好地配合.
但是,使用函数时事情会变得更复杂.我们可以使用lazyeval::interp
将大多数函数参数转换为列名,请参阅下面函数中mutate
to mutate_
call 的转换,更一般地说,帮助:?lazyeval::interp
尝试:
filter <- function (spp, LENGTH, WIDTH) { d <- subset (iris, subset=iris$Species == spp) large <- d %>% select_(LENGTH, WIDTH) %>% mutate_(sum = lazyeval::interp(~X + Y, X = as.name(LENGTH), Y = as.name(WIDTH))) big_samples <- which(large$sum > 4) return (length(big_samples)) }
更新:从dplyr 0.7.0开始,您可以使用整洁的eval来完成此任务.
有关详细信息,请参见http://dplyr.tidyverse.org/articles/programming.html.
filter_big <- function(spp, LENGTH, WIDTH) { LENGTH <- enquo(LENGTH) # Create quosure WIDTH <- enquo(WIDTH) # Create quosure iris %>% filter(Species == spp) %>% select(!!LENGTH, !!WIDTH) %>% # Use !! to unquote the quosure mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure filter(sum > 4) %>% nrow() } filter_big("virginica", Sepal.Length, Sepal.Width) > filter_big("virginica", Sepal.Length, Sepal.Width) [1] 50