在R中,我怎样才能最好地矢量化这个操作?
我有一个参考值表,具有较低的(A)和较高的(B)限制.
我还有一个值表(X)来查找上表.
对于X的每个值,我需要确定它是否位于参考表中A和B的任何值之间.
为了演示上述内容,以下是使用循环的解决方案:
#For Reproduceability, set.seed(1); #Set up the Reference and Lookup Tables nref = 5; nlook = 10 referenceTable <- data.frame(A=runif(nref,min=0.25,max=0.5), B=runif(nref,min=0.50,max=0.75)); lookupTable <- data.frame(X=runif(nlook),IsIn=0) #Process for each row in the lookup table #search for at least one match in the reference table where A <= X < B for(x in 1:nrow(lookupTable)){ v <- lookupTable$X[x] tmp <- subset(referenceTable,v >= A & v < B) lookupTable[x,'IsIn'] = as.integer(nrow(tmp) > 0) }
我正在尝试删除该for(x in .... )
组件,因为我现实生活中的表中存在数千条记录.
我找不到确切的欺骗,所以这是一个可能的解决方案data.table::foverlaps
.首先,我们需要添加一个额外的列lookupTable
,以便在两侧创建边界.然后key
在referenceTable
(必要的foverlaps
工作),然后只需运行一个简单的重叠加入而只选择先加入,因为你想要的任何连接(我用0^
,以转换为二进制,因为你不想要的实际位置)
library(data.table) setDT(lookupTable)[, Y := X] # Add an additional boundary column setkey(setDT(referenceTable)) # Key the referenceTable data set lookupTable[, IsIn := 0 ^ !foverlaps(lookupTable, referenceTable, by.x = c("X", "Y"), mult = "first", nomatch = 0L, which = TRUE)] # X IsIn Y # 1: 0.2059746 0 0.2059746 # 2: 0.1765568 0 0.1765568 # 3: 0.6870228 1 0.6870228 # 4: 0.3841037 1 0.3841037 # 5: 0.7698414 0 0.7698414 # 6: 0.4976992 1 0.4976992 # 7: 0.7176185 1 0.7176185 # 8: 0.9919061 0 0.9919061 # 9: 0.3800352 1 0.3800352 # 10: 0.7774452 0 0.7774452