我有一个程序,以这种格式给我数据
toy file_path Condition Trial.Num A B C ID A B C ID A B C ID 1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane 2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane 3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane 4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane 5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane
我的目标是将格式整理成至少可以通过重塑具有唯一列名来最终整理的东西
tidy_toy file_path Condition Trial.Num A B C ID 1 root/some.extension Baseline 1 2 3 5 car 2 root/thing.extension Baseline 2 3 6 45 car 3 root/else.extension Baseline 3 4 4 6 car 4 root/uniquely.extension Treatment 1 5 3 7 car 5 root/defined.extension Treatment 2 6 7 3 car 6 root/some.extension Baseline 1 2 1 7 bike 7 root/thing.extension Baseline 2 5 4 4 bike 8 root/else.extension Baseline 3 7 5 4 bike 9 root/uniquely.extension Treatment 1 1 7 37 bike 10 root/defined.extension Treatment 2 4 6 8 bike 11 root/some.extension Baseline 1 4 9 0 plane 12 root/thing.extension Baseline 2 9 5 4 plane 13 root/else.extension Baseline 3 68 7 56 plane 14 root/uniquely.extension Treatment 1 9 8 7 plane 15 root/defined.extension Treatment 2 9 0 8 plane
如果我尝试melt
从toy
它不起作用,因为只有第一个ID列将被使用id.vars
(因此一切都将被标记为汽车).相同的变量将被删除.
这是两个表的输入
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension", "root/else.extension", "root/some.extension", "root/thing.extension", "root/uniquely.extension"), class = "factor"), Condition = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("Baseline", "Treatment"), class = "factor"), Trial.Num = c(1L, 2L, 3L, 1L, 2L), A = 2:6, B = c(3L, 6L, 4L, 3L, 7L), C = c(5L, 45L, 6L, 7L, 3L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "car", class = "factor"), A = c(2L, 5L, 7L, 1L, 4L), B = c(1L, 4L, 5L, 7L, 6L), C = c(7L, 4L, 4L, 37L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "bike", class = "factor"), A = c(4L, 9L, 68L, 9L, 9L), B = c(9L, 5L, 7L, 8L, 0L), C = c(0L, 4L, 56L, 7L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "plane", class = "factor")), .Names = c("file_path", "Condition", "Trial.Num", "A", "B", "C", "ID", "A", "B", "C", "ID", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA, -5L)) structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L, 3L, 4L, 2L, 5L, 1L, 3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension", "root/else.extension", "root/some.extension", "root/thing.extension", "root/uniquely.extension"), class = "factor"), Condition = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("Baseline", "Treatment"), class = "factor"), Trial.Num = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L), A = c(2L, 3L, 4L, 5L, 6L, 2L, 5L, 7L, 1L, 4L, 4L, 9L, 68L, 9L, 9L), B = c(3L, 6L, 4L, 3L, 7L, 1L, 4L, 5L, 7L, 6L, 9L, 5L, 7L, 8L, 0L), C = c(5L, 45L, 6L, 7L, 3L, 7L, 4L, 4L, 37L, 8L, 0L, 4L, 56L, 7L, 8L), ID = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L), .Label = c("bike", "car", "plane"), class = "factor")), .Names = c("file_path", "Condition", "Trial.Num", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA, -15L))
Jaap.. 11
您可以使用make.unique
-function创建唯一的列名称.之后,你可以使用melt
从data.table
它能够创建基于多重价值列-package patterns
在COLUMNNAMES:
# make the column names unique names(toy) <- make.unique(names(toy)) # let the 'Condition' column start with a small letter 'c' # so it won't be detected by the patterns argument from melt names(toy)[2] <- tolower(names(toy)[2]) # load the 'data.table' package library(data.table) # tidy the data into long format tidy_toy <- melt(setDT(toy), measure.vars = patterns('^A','^B','^C','^ID'), value.name = c('A','B','C','ID'))
这使:
> tidy_toy file_path condition Trial.Num variable A B C ID 1: root/some.extension Baseline 1 1 2 3 5 car 2: root/thing.extension Baseline 2 1 3 6 45 car 3: root/else.extension Baseline 3 1 4 4 6 car 4: root/uniquely.extension Treatment 1 1 5 3 7 car 5: root/defined.extension Treatment 2 1 6 7 3 car 6: root/some.extension Baseline 1 2 2 1 7 bike 7: root/thing.extension Baseline 2 2 5 4 4 bike 8: root/else.extension Baseline 3 2 7 5 4 bike 9: root/uniquely.extension Treatment 1 2 1 7 37 bike 10: root/defined.extension Treatment 2 2 4 6 8 bike 11: root/some.extension Baseline 1 3 4 9 0 plane 12: root/thing.extension Baseline 2 3 9 5 4 plane 13: root/else.extension Baseline 3 3 68 7 56 plane 14: root/uniquely.extension Treatment 1 3 9 8 7 plane 15: root/defined.extension Treatment 2 3 9 0 8 plane
另一种选择是使用列索引列表measure.vars
:
tidy_toy <- melt(setDT(toy), measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)), value.name = c('A','B','C','ID'))
因此,不需要使列名称唯一.
一种更复杂的方法,可以创建可以通过patterns
参数更好地区分的名称:
# select the names that are not unique tt <- table(names(toy)) idx <- which(names(toy) %in% names(tt)[tt > 1]) nms <- names(toy)[idx] # make them unique names(toy)[idx] <- paste(nms, rep(seq(length(nms) / length(names(tt)[tt > 1])), each = length(names(tt)[tt > 1])), sep = '.') # your columnnames are now unique: > names(toy) [1] "file_path" "Condition" "Trial.Num" "A.1" "B.1" "C.1" "ID.1" "A.2" [9] "B.2" "C.2" "ID.2" "A.3" "B.3" "C.3" "ID.3" # tidy the data into long format tidy_toy <- melt(setDT(toy), measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'), value.name = c('A','B','C','ID'))
这将产生相同的最终结果.
正如评论中所提到的,janitor
-package也可以帮助解决这个问题.在clean_names()
为类似工程make.unique
的功能.请看这里的解释.
您可以使用make.unique
-function创建唯一的列名称.之后,你可以使用melt
从data.table
它能够创建基于多重价值列-package patterns
在COLUMNNAMES:
# make the column names unique names(toy) <- make.unique(names(toy)) # let the 'Condition' column start with a small letter 'c' # so it won't be detected by the patterns argument from melt names(toy)[2] <- tolower(names(toy)[2]) # load the 'data.table' package library(data.table) # tidy the data into long format tidy_toy <- melt(setDT(toy), measure.vars = patterns('^A','^B','^C','^ID'), value.name = c('A','B','C','ID'))
这使:
> tidy_toy file_path condition Trial.Num variable A B C ID 1: root/some.extension Baseline 1 1 2 3 5 car 2: root/thing.extension Baseline 2 1 3 6 45 car 3: root/else.extension Baseline 3 1 4 4 6 car 4: root/uniquely.extension Treatment 1 1 5 3 7 car 5: root/defined.extension Treatment 2 1 6 7 3 car 6: root/some.extension Baseline 1 2 2 1 7 bike 7: root/thing.extension Baseline 2 2 5 4 4 bike 8: root/else.extension Baseline 3 2 7 5 4 bike 9: root/uniquely.extension Treatment 1 2 1 7 37 bike 10: root/defined.extension Treatment 2 2 4 6 8 bike 11: root/some.extension Baseline 1 3 4 9 0 plane 12: root/thing.extension Baseline 2 3 9 5 4 plane 13: root/else.extension Baseline 3 3 68 7 56 plane 14: root/uniquely.extension Treatment 1 3 9 8 7 plane 15: root/defined.extension Treatment 2 3 9 0 8 plane
另一种选择是使用列索引列表measure.vars
:
tidy_toy <- melt(setDT(toy), measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)), value.name = c('A','B','C','ID'))
因此,不需要使列名称唯一.
一种更复杂的方法,可以创建可以通过patterns
参数更好地区分的名称:
# select the names that are not unique tt <- table(names(toy)) idx <- which(names(toy) %in% names(tt)[tt > 1]) nms <- names(toy)[idx] # make them unique names(toy)[idx] <- paste(nms, rep(seq(length(nms) / length(names(tt)[tt > 1])), each = length(names(tt)[tt > 1])), sep = '.') # your columnnames are now unique: > names(toy) [1] "file_path" "Condition" "Trial.Num" "A.1" "B.1" "C.1" "ID.1" "A.2" [9] "B.2" "C.2" "ID.2" "A.3" "B.3" "C.3" "ID.3" # tidy the data into long format tidy_toy <- melt(setDT(toy), measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'), value.name = c('A','B','C','ID'))
这将产生相同的最终结果.
正如评论中所提到的,janitor
-package也可以帮助解决这个问题.在clean_names()
为类似工程make.unique
的功能.请看这里的解释.