我试图在我自己的函数中使用R中的tidyverse的准引用。我在这里已经读过这一篇文章:将参数列表传递给带有准引号的函数,以及此处的全部内容:https : //tidyeval.tidyverse.org/
但是我仍然无法正常工作。
假设我有以下数据:
dat <- data.frame(time = runif(20), group1 = rep(1:2, times = 10), group2 = rep(1:2, each = 10), group3 = rep(3:4, each = 10))
我现在想做的是编写一个执行以下操作的函数:
取一个数据集
指定包含时间的变量(请注意,在另一个数据集中,这可能称为“小时”或“ qtime”或其他名称)
指定我要对哪些组进行操作/统计
因此,我希望用户使用的功能如下:
test_function(data = dat, time_var = "time", group_vars = c("group1", "group3"))
请注意,下次我可能选择其他分组变量,或者没有选择。
假设在我要执行的功能中:
计算有关时间变量的某些统计信息,例如分位数。注意:我想按我的分组变量进行拆分
这是我最近的尝试之一:
test_function <- function(data, time_var = NULL, group_vars = NULL) { # Note I initialize the variables with NULL, since e.g. the user might not specify a grouping and I want to check for that in my function at some point) time_var <- enquo(time_var) group_vars <- enquos(group_vars) # Here I try to group by my grouping variables temp_data <- data %>% group_by_at(group_vars) %>% mutate(!!sym(time_var) := !!sym(time_var) / 60) # Here I'm calculating some stats time_stats <- temp_data %>% summarize_at(vars(!!time_var), list(p0.1_time = ~quantile(., probs = 0.1, na.rm = T), p0.2_time = ~quantile(., probs = 0.2, na.rm = T), p0.3_time = ~quantile(., probs = 0.3, na.rm = T), p0.4_time = ~quantile(., probs = 0.4, na.rm = T), p0.5_time = ~quantile(., probs = 0.5, na.rm = T), p0.6_time = ~quantile(., probs = 0.6, na.rm = T), p0.7_time = ~quantile(., probs = 0.7, na.rm = T), p0.8_time = ~quantile(., probs = 0.8, na.rm = T), p0.9_time = ~quantile(., probs = 0.9, na.rm = T), p0.95_time = ~quantile(., probs = 0.95, na.rm = T))) }
我的代码有什么问题?即,我专门与!!,!!!,sym,enquo和enquos事物作斗争。为什么group_by_at东西不需要!! 东西,而我的摘要和变异确实需要它吗?
Make these changes:
use sym
and syms
rather than enquo
and enquos
use !!
and !!!
respectively.
createpo
as a list and then use unnest_wider
to expand into columns
quantile
is already vectorized so we don't need map
the mutate
can be incorporated right into the quantile
call eliminating it
consolidate the pipelines into a single pipeline
use TRUE
rather than T
since the latter can be masked by a variable of that name whereas no variable may be called TRUE
.
we can use plain group_by
and summarize
there is no group3
in the sample data so we used group2
instead
this does not make sense without time_var
so remove the default of NULL
This gives the following code
test_function <- function(data, time_var, group_vars = NULL) { p <- c(1:9/10, 0.95) time_var <- sym(time_var) group_vars <- syms(group_vars) data %>% group_by(!!!group_vars) %>% summarize(po = list(quantile(!!time_var / 60, p, na.rm = TRUE))) %>% ungroup %>% unnest_wider(po) } test_function(data = dat, time_var = "time", group_vars = c("group1", "group2"))
giving:
# A tibble: 4 x 12 group1 group2 `10%` `20%` `30%` `40%` `50%` `60%` `70%` `80%` `90%` `95%`1 1 1 0.00237 0.00432 0.00654 0.00903 0.0115 0.0120 0.0124 0.0133 0.0147 0.0154 2 1 2 0.00244 0.00251 0.00281 0.00335 0.00388 0.00410 0.00432 0.00493 0.00591 0.00640 3 2 1 0.00371 0.00381 0.00468 0.00632 0.00796 0.0101 0.0122 0.0136 0.0143 0.0147 4 2 2 0.00385 0.00538 0.00630 0.00660 0.00691 0.00725 0.00759 0.00907 0.0117 0.0130