我想按组对数据帧中的行进行采样.但是这里有一个问题,我想根据另一个表中的数据对不同数量的记录进行采样.这是我可重复的数据:
df <- data_frame( Stratum = rep(c("High","Medium","Low"), 10), id = c(1:30), Value = runif(30) ) sampleGuide <- data_frame( Stratum = c("High","Medium","Low"), Surveys = c(3,2,5) )
输出应如下所示:
# A tibble: 10 × 2 Stratum Value1 High 0.21504972 2 High 0.71069005 3 High 0.09286843 4 Medium 0.52553056 5 Medium 0.06682459 6 Low 0.38793128 7 Low 0.01285081 8 Low 0.87865734 9 Low 0.09100829 10 Low 0.14851919
这是我的非工作尝试
> df %>% + left_join(sampleGuide, by = "Stratum") %>% + group_by(Stratum) %>% + sample_n(unique(Surveys)) Error in unique(Surveys) : object 'Surveys' not found
也
> df %>% + group_by(Stratum) %>% + nest() %>% + left_join(sampleGuide, by = "Stratum") %>% + mutate(sample = map(., ~ sample_n(data, Surveys))) Error in mutate_impl(.data, dots) : Don't know how to sample from objects of class function
似乎sample_n
需要size
一个单一的数字.有任何想法吗?
我只是在寻找tidyverse
解决方案.额外积分purrr
!
这是一个类似的问题,但我对接受的答案不满意,因为IRL我正在处理的阶层数量很大.
想通了与map2()
来自purrr
df %>% nest(-Stratum) %>% left_join(sampleGuide, by = "Stratum") %>% mutate(Sample = map2(data, Surveys, sample_n)) %>% unnest(Sample)