我有数据框
df=data.frame(x=rnorm(8),y=runif(8),longstring=c("foo_100_Case1","foo_125_Case1","bar_100_Case1","bar_125_Case1","foo_100_Case2","foo_125_Case2","bar_100_Case2","bar_125_Case2"),stringsAsFactors = F)
我需要将最后一列拆分为三列,分隔符为"_".我一直在做以下事情:
a=matrix(unlist(strsplit(df$longstring,"_",fixed=T)),8,3,byrow = T) df$type=a[,1] df$point=a[,2] df$case=a[,3]
但我想知道是否有一种更简单的方法:组合strsplit
和unlist
特别笨拙,并且它不会使代码非常易读.
以下是一些可供尝试的选项:
我的"splitstackshape"包是专为这种东西设计的......
library(splitstackshape) cSplit(df, "longstring", "_") # x y longstring_1 longstring_2 longstring_3 # 1: -1.41524742 0.2123978 foo 100 Case1 # 2: -1.09240237 0.3899935 foo 125 Case1 # 3: 0.39675025 0.2162463 bar 100 Case1 # 4: -1.14996728 0.7608128 bar 125 Case1 # 5: -0.07657172 0.6878348 foo 100 Case2 # 6: 0.29549599 0.2216566 foo 125 Case2 # 7: 1.78622612 0.1496666 bar 100 Case2 # 8: -0.11749579 0.9255409 bar 125 Case2
"data.table"包给我们带来了快速的tstrsplit
功能......
library(data.table) as.data.table(df)[ , paste0("V", 1:3) := tstrsplit(longstring, "_")][ , longstring := NULL][]
如果你有时间并且想等待read.table
它的工作......
cbind(df[1:2], read.table(text = df$longstring, sep = "_"))
如果你需要其他快速的东西......
library(iotools) cbind(df[1:2], mstrsplit(df$longstring, sep = "_"))