给定一个数组'数组'和一组索引'索引',如何找到通过以矢量化方式沿着这些索引分割数组形成的子数组的累积和?澄清一下,假设我有:
>>> array = np.arange(20) >>> array array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) indices = np.arrray([3, 8, 14])
操作应该输出:
array([0, 1, 3, 3, 7, 12, 18, 25, 8, 17, 27, 38, 50, 63, 14, 29, 45, 62, 80, 99])
请注意,阵列非常大(100000个元素),因此,我需要一个矢量化答案.使用任何循环会大大减慢它.另外,如果我有同样的问题,但是2D数组和相应的索引,我需要为数组中的每一行做同样的事情,我该怎么办呢?
对于2D版本:
>>>array = np.arange(12).reshape((3,4)) >>>array array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> indices = np.array([[2], [1, 3], [1, 2]])
输出将是:
array([[ 0, 1, 3, 3], [ 4, 9, 6, 13], [ 8, 17, 10, 11]])
澄清:每一行都将被拆分.
您可以在indices
位置处引入原始累积求和数组的微分,以在这些位置创建边界效果,这样当差分数组被累加求和时,为我们提供指数停止的累积求和输出.这可能在初看起来有点做作,但坚持下去,试试其他样品,希望有意义!这个想法与this other MATLAB solution.
So中的应用非常相似,遵循这样一种理念,这里有一种方法numpy.diff
与cumulative summation
- 一起使用-
# Get linear indices n = array.shape[1] lidx = np.hstack(([id*n+np.array(item) for id,item in enumerate(indices)])) # Get successive differentiations diffs = array.cumsum(1).ravel()[lidx] - array.ravel()[lidx] # Get previous group's offsetted summations for each row at all # indices positions across the entire 2D array _,idx = np.unique(lidx/n,return_index=True) offsetted_diffs = np.diff(np.append(0,diffs)) offsetted_diffs[idx] = diffs[idx] # Get a copy of input array and place previous group's offsetted summations # at indices. Then, do cumulative sum which will create a boundary like # effect with those offsets at indices positions. arrayc = array.copy() arrayc.ravel()[lidx] -= offsetted_diffs out = arrayc.cumsum(1)
这应该是一个几乎矢量化的解决方案,几乎是因为即使我们在循环中计算线性索引,但由于它不是这里的计算密集型部分,因此它对总运行时间的影响将是最小的.此外,如果您不关心破坏输入以节省内存arrayc
,array
则可以替换.
样本输入,输出 -
In [75]: array Out[75]: array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23]]) In [76]: indices Out[76]: array([[3, 6], [4, 7], [5]], dtype=object) In [77]: out Out[77]: array([[ 0, 1, 3, 3, 7, 12, 6, 13], [ 8, 17, 27, 38, 12, 25, 39, 15], [16, 33, 51, 70, 90, 21, 43, 66]])