在numpy数组中找到连续重复nan的最大数量的最佳方法是什么?
例子:
from numpy import nan
输入1: [nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]
输出1: 3
输入2: [nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]
输出2: 4
这是一种方法 -
def max_repeatedNaNs(a): # Mask of NaNs mask = np.concatenate(([False],np.isnan(a),[False])) if ~mask.any(): return 0 else: # Count of NaNs in each NaN group. Then, get max count as o/p. c = np.flatnonzero(mask[1:] < mask[:-1]) - \ np.flatnonzero(mask[1:] > mask[:-1]) return c.max()
这是一个改进版本 -
def max_repeatedNaNs_v2(a): mask = np.concatenate(([False],np.isnan(a),[False])) if ~mask.any(): return 0 else: idx = np.nonzero(mask[1:] != mask[:-1])[0] return (idx[1::2] - idx[::2]).max()
针对以下方面的基准测试@pltrdy's comment
-
In [77]: a = np.random.rand(10000) In [78]: a[np.random.choice(range(len(a)),size=1000,replace=0)] = np.nan In [79]: %timeit contiguous_NaN(a) #@pltrdy's solution 100 loops, best of 3: 15.8 ms per loop In [80]: %timeit max_repeatedNaNs(a) 10000 loops, best of 3: 103 µs per loop In [81]: %timeit max_repeatedNaNs_v2(a) 10000 loops, best of 3: 86.4 µs per loop