题:
我在R.工作.我希望2 data.tables(共享含义相同的列名)的共享列具有匹配的类.我正在努力将一种未知类的对象一般转换为另一个对象的未知类.
更多背景:
我知道如何设置类列在data.table,我知道大概的as
功能.此外,这个问题并不完全data.table
具体,但是当我使用data.table
s 时,它经常会出现.此外,假设期望的强制是可能的.
我有2个data.tables.它们共享一些列名称,这些列旨在表示相同的信息.对于表A和表B共享的列名,我希望A的类与B中的类(或其他方式)相匹配.
示例data.table
s:
A <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = c(NA, -45L), class = c("data.table", "data.frame")) B <- structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), bt = c(-9.95187702337873, -9.48946944434626, -9.74178662514147, -5.36167545158338, -4.76405522202426, -5.41964239804882, -0.0807951335119085, 0.520481719699774, 0.0393874225863578, 5.40557402913123, 5.47927931969583, 5.37228402911139, 9.82774396910091, 9.89629694010177, 9.98105260936272, -9.82469892896284, -9.42530210357904, -9.66171049964775, -5.17540952901709, -4.81859082470115, -5.3577146169737, -0.0685310909609001, 0.441383303157166, -0.0105897444321987, 5.24205882775199, 5.65773605162835, 5.40217185632441, 9.90299445851434, 9.78883672575814, 9.98747998379124, -9.69843398105195, -9.31530717395811, -9.77406601252698, -4.83080164375344, -4.89056304189872, -5.3904000267275, -0.121508487954861, 0.493798577602088, -0.118550709142654, 5.23654772583187, 5.87760447006892, 5.22478092346285, 9.90949768116403, 9.85433376398086, 9.91619307289277), yr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", "bt", "yr"), row.names = c(NA, -45L), class = c("data.table", "data.frame"), sorted = c("year", "stratum"))
这是他们的样子:
> A year stratum 1: 1 1 2: 1 2 3: 1 3 4: 1 4 > B year stratum bt yr 1: 1 1 -9.95187702 1 2: 1 2 -9.48946944 1 3: 1 3 -9.74178663 1 4: 1 4 -5.36167545 1
以下是课程:
> sapply(A, class) year stratum "integer" "integer" > sapply(B, class) year stratum bt yr "numeric" "integer" "numeric" "numeric"
手动,我可以通过以下方式完成所需的任务:
A[,year:=as.numeric(year)]
当只有1列需要更改时,这很容易,您提前知道该列,并且提前知道所需的类.如果需要,将任意列转换为给定类也很容易.我也知道如何将任意列转换为任何给定的类.
我的失败尝试:
(编辑:这实际上有效;请参阅我的回答)
s2c <- function (x, type = "list") { as.call(lapply(c(type, x), as.symbol)) } # In this case, I can assume all columns of A can be found in B # I am also able to assume that the desired conversion is possible B.class <- sapply(B[,eval(s2c(names(A)))], class) for(col in names(A)){ set(A, j=col, value=as(A[[col]], B.class[col])) }
但这仍然会返回年份列"integer"
,而不是"numeric"
:
> sapply(A, class) year stratum "integer" "integer"
上面例子中的问题class(as(1L, "numeric"))
仍然是返回"integer"
.另一方面,class(as.numeric(1L))
回报"numeric"
; 但是,我不知道as.numeric
是否需要提前需要.
问题,重述:
当没有列和to
/ from
类提前知道时,如何使列类匹配?
其他想法:
在某种程度上,问题主要是关于任意类匹配.我经常使用data.table遇到这个问题,因为它对类匹配非常直言不讳.例如,当需要插入NA
适当的类型(NA_real_
vs NA_character_
等)时,我会遇到类似的问题,具体取决于列的类(请参阅本课题中的相关问题/ 问题).
同样,这个问题可以看作是在事先不知道的任意类之间进行转换的一般问题.在过去,我写过函数switch
用来做类似的事情switch(class(x), double = as.numeric(...), character = as.character(...), ...
,但这看起来很难看.我在data.table的上下文中提出这个问题的唯一原因是因为我经常遇到这种类型功能的需要.
这是确保普通课程的一种非常粗略的方法:
library(magrittr) cols = intersect(names(A), names(B)) r = rbindlist(list(A = A, B = B[,cols,with=FALSE]), idcol = TRUE) r[, (cols) := lapply(.SD, . %>% as.character %>% type.convert), .SDcols=cols] B[, (cols) := r[.id=="B", cols, with=FALSE]] A[, (cols) := r[.id=="A", cols, with=FALSE]] sapply(A, class); sapply(B, class) # year stratum # "integer" "integer" # year stratum yr # "integer" "integer" "numeric"
我不喜欢这个解决方案:
我经常使用ID的所有整数代码(比如"00001"
,"02995"
),这会将这些代码强制转换为实际的整数,这很糟糕.
谁知道这是什么会做花哨的类象Date
或factor
?如果你在读取数据后立即进行col-class规范化,这无关紧要,我想.
数据:
# slightly tweaked from OP A <- setDT(structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = c(NA, -45L), class = c("data.frame"))) B <- setDT(structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), yr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", "yr"), row.names = c(NA, -45L), class = c("data.frame")))
评论.如果你有针对magrittr的东西,请使用function(x) type.convert(as.character(x))
代替. %>%
钻头.
不是很优雅,但你可以"建立"这样的as.*
电话:
for (x in colnames(A)) { A[,x] <- eval( call( paste0("as.", class(B[,x])), A[,x]) )}