我有以下数据集:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1), Case = c(1,1,1,1,2,2,3,3,3,4,5), Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined")) sample.data Step Case Decision 1 1 1 Referred 2 2 1 Referred 3 3 1 Referred 4 4 1 Approved 5 1 2 Referred 6 2 2 Declined 7 1 3 Referred 8 2 3 Referred 9 3 3 Declined 10 1 4 Approved 11 1 5 Declined
是否有可能在R中将其转换为宽表格格式,并在标题上做出决定,每个单元格的值都是事件的计数,例如:
Case Referred Approved Declined 1 3 1 0 2 1 0 1 3 2 0 1 4 0 1 0 5 0 0 1
Jaap.. 13
-package dcast
函数中的聚合参数reshape2
默认为length
(= count).在data.table
-package中,dcast
实现了该功能的改进版本.所以在你的情况下,这将是:
library('reshape2') # or library('data.table') newdf <- dcast(sample.data, Case ~ Decision)
或明确使用参数:
newdf <- dcast(sample.data, Case ~ Decision, value.var = "Decision", fun.aggregate = length)
这给出了以下数据帧:
> newdf Case Approved Declined Referred 1 1 1 0 3 2 2 0 1 1 3 3 0 1 2 4 4 1 0 0 5 5 0 1 0
如果未指定聚合函数,则会收到警告,告知您将dcast
其lenght
用作默认值.
-package dcast
函数中的聚合参数reshape2
默认为length
(= count).在data.table
-package中,dcast
实现了该功能的改进版本.所以在你的情况下,这将是:
library('reshape2') # or library('data.table') newdf <- dcast(sample.data, Case ~ Decision)
或明确使用参数:
newdf <- dcast(sample.data, Case ~ Decision, value.var = "Decision", fun.aggregate = length)
这给出了以下数据帧:
> newdf Case Approved Declined Referred 1 1 1 0 3 2 2 0 1 1 3 3 0 1 2 4 4 1 0 0 5 5 0 1 0
如果未指定聚合函数,则会收到警告,告知您将dcast
其lenght
用作默认值.
您可以通过简单的table()
声明来完成此任务.您可以使用设置因子级别来获得您想要的响应.
sample.data$Decision <- factor(x = sample.data$Decision, levels = c("Referred","Approved","Declined")) table(Case = sample.data$Case,sample.data$Decision) Case Referred Approved Declined 1 3 1 0 2 1 0 1 3 2 0 1 4 0 1 0 5 0 0 1
这是一个dplyr + tidyr方法:
if (!require("pacman")) install.packages("pacman") pacman::p_load(dplyr, tidyr) sample.data %>% count(Case, Decision) %>% spread(Decision, n, fill = 0) ## Case Approved Declined Referred ## (dbl) (dbl) (dbl) (dbl) ## 1 1 1 0 3 ## 2 2 0 1 1 ## 3 3 0 1 2 ## 4 4 1 0 0 ## 5 5 0 1 0