我使用的dplyr
替换value
用NA
,如果条件满足,但它把NA
在地方,它不应该.
dput:
df <- structure(list(id = c("USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275"), element = c("TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = c(1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937), month = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), day = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5), date = structure(c(-11933, -11933, -11932, -11932, -11931, -11931, -11930, -11930, -11929, -11929), class = "Date"), value = c(0, 53.96, 68, 44.96, 62.06, 53.96, 73.04, 53.96, 69.08, 50)), .Names = c("id", "element", "year", "month", "day", "date", "value"), row.names = c(NA, 10L), class = "data.frame")
data.frame
(注意:条件仅在第1行和第2行满足)
id element year month day date value 1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00 2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96 3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00 4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96 5 USC00231275 TMAX 1937 5 3 1937-05-03 62.06 6 USC00231275 TMIN 1937 5 3 1937-05-03 53.96 7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04 8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96 9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08 10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
dplyr
df %>% group_by(date) %>% mutate( value = if(value[element == 'TMIN'] >= value[element == 'TMAX']) as.numeric(NA) else value ) id element year month day date value (chr) (chr) (dbl) (dbl) (dbl) (date) (dbl) 1 USC00231275 TMAX 1937 5 1 1937-05-01 NA 2 USC00231275 TMIN 1937 5 1 1937-05-01 NA 3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00 4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96 5 USC00231275 TMAX 1937 5 3 1937-05-03 NA 6 USC00231275 TMIN 1937 5 3 1937-05-03 NA 7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04 8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96 9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08 10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
请注意,应该更改的唯一行是1
和2
,但dplyr
更改了行5
,6
即使条件未满足.