我希望使用包data.table和参数roll ='nearest'在'date'列上找到最接近的匹配项。我首先在另一列(字母)上进行匹配:
set.seed(1) A <- data.table( dates.A = seq.Date(as.Date('2008-01-01'),as.Date('2008-01-31'), by = '3 days'), letters.A = LETTERS[1:4] , value.A = runif(4) ) B <- data.table( date.B = seq.Date(as.Date('2008-01-01'),as.Date('2008-01-05'), by = 'days'), letters.B = LETTERS[1:4] , value.B = runif(4) ) #### Define the columns I merge on A[, ':=' (dates.merge = dates.A, letters.merge = letters.A)] B[, ':=' (dates.merge = date.B, letters.merge = letters.B)] setkeyv(A, c('letters.merge','dates.merge')) setkeyv(B, c('letters.merge','dates.merge')) result <- B[A, roll = 'nearest'] #### As a side note, how do I avoid the change in order of my data.tables?? setorder(result,dates.A,letters.A) setorder(A,dates.A) setorder(B,date.B)
结果A和B的输出如下所示:
> result date.B letters.B value.B dates.merge letters.merge dates.A letters.A value.A 1: 2008-01-01 A 0.2016819 2008-01-01 A 2008-01-01 A 0.2655087 2: 2008-01-02 B 0.8983897 2008-01-04 B 2008-01-04 B 0.3721239 3: 2008-01-03 C 0.9446753 2008-01-07 C 2008-01-07 C 0.5728534 4: 2008-01-04 D 0.6607978 2008-01-10 D 2008-01-10 D 0.9082078 5: 2008-01-05 A 0.2016819 2008-01-13 A 2008-01-13 A 0.2655087 6: 2008-01-02 B 0.8983897 2008-01-16 B 2008-01-16 B 0.3721239 7: 2008-01-03 C 0.9446753 2008-01-19 C 2008-01-19 C 0.5728534 8: 2008-01-04 D 0.6607978 2008-01-22 D 2008-01-22 D 0.9082078 9: 2008-01-05 A 0.2016819 2008-01-25 A 2008-01-25 A 0.2655087 10: 2008-01-02 B 0.8983897 2008-01-28 B 2008-01-28 B 0.3721239 11: 2008-01-03 C 0.9446753 2008-01-31 C 2008-01-31 C 0.5728534 > A dates.A letters.A value.A dates.merge letters.merge 1: 2008-01-01 A 0.2655087 2008-01-01 A 2: 2008-01-04 B 0.3721239 2008-01-04 B 3: 2008-01-07 C 0.5728534 2008-01-07 C 4: 2008-01-10 D 0.9082078 2008-01-10 D 5: 2008-01-13 A 0.2655087 2008-01-13 A 6: 2008-01-16 B 0.3721239 2008-01-16 B 7: 2008-01-19 C 0.5728534 2008-01-19 C 8: 2008-01-22 D 0.9082078 2008-01-22 D 9: 2008-01-25 A 0.2655087 2008-01-25 A 10: 2008-01-28 B 0.3721239 2008-01-28 B 11: 2008-01-31 C 0.5728534 2008-01-31 C > B date.B letters.B value.B dates.merge letters.merge 1: 2008-01-01 A 0.2016819 2008-01-01 A 2: 2008-01-02 B 0.8983897 2008-01-02 B 3: 2008-01-03 C 0.9446753 2008-01-03 C 4: 2008-01-04 D 0.6607978 2008-01-04 D 5: 2008-01-05 A 0.2016819 2008-01-05 A
但是,请注意,日期与日期之间最接近的日期。A“ 2008-01-07”应为“ 2008-01-05”(见B),而不是“ 2008-01-03”。对于date.A结果中“ 2008-01-07”以下的所有日期,也是如此。
我在这里做错了什么?