我有一个带有三列的数据框(df),如下所示:
结构体:
id id1 age A1 a1 32 A1 a2 45 A1 a3 45 A1 a4 12 A2 b1 15 A2 b5 34 A2 b64 17
预期产量:
id count count1 A1 4 1 A2 3 2
逻辑:
列“ count”是重复“ id”的次数
列“ count1”是年龄小于21的行数
当前代码:
library(dplyr) df_summarized <- df %>% group_by(id) >%> summarise(count = n(),count1 = count(age<21))
问题:
Error: no applicable method for 'group_by_' applied to an object of class "logical"
akrun.. 5
我们需要做 sum
df %>% group_by(id) %>% summarise(count = n(),count1 = sum(age < 21)) # A tibble: 2 × 3 # id count count1 ##1 A1 4 1 #2 A2 3 2
作为count
适用于data.frame
或tbl_df
,而不是在内部的单个列summarise
或使用 data.table
library(data.table) setDT(df)[, .(count = .N, count1 = sum(age < 21)), id]
或搭配 base R
cbind(count = rowSums(table(df[-2])), count1 = as.vector(rowsum(+(df$age < 21), df$id))) # count count1 #A1 4 1 #A2 3 2
或使用aggregate
基于sum
do.call(data.frame, aggregate(age~id, df, FUN = function(x) c(count = length(x), count1 = sum(x<21))))
注意:所有上述方法为数据集提供适当的列。这将在中特别说明aggregate
。这就是将输出列(即矩阵)转换为适当的列的原因do.call(data.frame
我们需要做 sum
df %>% group_by(id) %>% summarise(count = n(),count1 = sum(age < 21)) # A tibble: 2 × 3 # id count count1 ##1 A1 4 1 #2 A2 3 2
作为count
适用于data.frame
或tbl_df
,而不是在内部的单个列summarise
或使用 data.table
library(data.table) setDT(df)[, .(count = .N, count1 = sum(age < 21)), id]
或搭配 base R
cbind(count = rowSums(table(df[-2])), count1 = as.vector(rowsum(+(df$age < 21), df$id))) # count count1 #A1 4 1 #A2 3 2
或使用aggregate
基于sum
do.call(data.frame, aggregate(age~id, df, FUN = function(x) c(count = length(x), count1 = sum(x<21))))
注意:所有上述方法为数据集提供适当的列。这将在中特别说明aggregate
。这就是将输出列(即矩阵)转换为适当的列的原因do.call(data.frame