有一个相对简单的代码块,它循环遍历两个数组,相乘并累加:
import numpy as np a = np.array([1, 2, 4, 6, 7, 8, 9, 11]) b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9]) c = [] d = 0 for i, val in enumerate(a): d += val c.append(d) d *= b[i]
有没有办法在没有迭代的情况下做到这一点?我想可以使用cumsum/cumprod,但我无法弄清楚如何.当你逐步分解正在发生的事情时,它看起来像这样:
# 0: 0 + a[0] # 1: ((0 + a[0]) * b[0]) + a[1] # 2: ((((0 + a[0]) * b[0]) + a[1]) * b[1]) + a[2]
编辑澄清:我对列表(或数组)感兴趣c.
在每次迭代中,您都有 -
d[n+1] = d[n] + a[n] d[n+1] = d[n+1] * b[n]
因此,基本上 -
d[n+1] = (d[n] + a[n]) * b[n]
即 -
d[n+1] = (d[n]* b[n]) + K[n] #where `K[n] = a[n] * b[n]`
现在,使用这个公式,如果你写下直到n = 2
案例的表达式,你会有 -
d [1] = d [0]*b [0] + K [0]
d [2] = d [0]*b [0]*b [1] + K [0]*b [1] + K [1]
d [3] = d [0]*b [0]*b [1]*b [2] + K [0]*b [1]*b [2] + K [1]*b [2] + K [2]
Scalars : b[0]*b[1]*b[2] b[1]*b[2] b[2] 1 Coefficients : d[0] K[0] K[1] K[2]
因此,你需要反向的cumprod b
,用K
数组执行元素乘法.最后,在缩小之前存储c
,执行cumsum
和c
存储b
,因此您需要cumsum
通过反转的cumprod 缩小版本b
.
最终的实现看起来像这样 -
# Get reversed cumprod of b and pad with `1` at the end b_rev_cumprod = b[::-1].cumprod()[::-1] B = np.hstack((b_rev_cumprod,1)) # Get K K = a*b # Append with 0 at the start, corresponding starting d K_ext = np.hstack((0,K)) # Perform elementwsie multiplication and cumsum and scale down for final c sums = (B*K_ext).cumsum() c = sums[1:]/b_rev_cumprod
运行时测试并验证输出
功能定义 -
def original_approach(a,b): c = [] d = 0 for i, val in enumerate(a): d = d+val c.append(d) d = d*b[i] return c def vectorized_approach(a,b): b_rev_cumprod = b[::-1].cumprod()[::-1] B = np.hstack((b_rev_cumprod,1)) K = a*b K_ext = np.hstack((0,K)) sums = (B*K_ext).cumsum() return sums[1:]/b_rev_cumprod
运行时和验证
案例#1:OP示例案例
In [301]: # Inputs ...: a = np.array([1, 2, 4, 6, 7, 8, 9, 11]) ...: b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9]) ...: In [302]: original_approach(a,b) Out[302]: [1, 2.0099999999999998, 4.4020000000000001, 6.1320600000000001, 7.6132059999999999, 8.7613205999999995, 14.256792359999999, 18.128396179999999] In [303]: vectorized_approach(a,b) Out[303]: array([ 1. , 2.01 , 4.402 , 6.13206 , 7.613206 , 8.7613206 , 14.25679236, 18.12839618])
案例#2:大输入案例
In [304]: # Inputs ...: N = 1000 ...: a = np.random.randint(0,100000,N) ...: b = np.random.rand(N)+0.1 ...: In [305]: np.allclose(original_approach(a,b),vectorized_approach(a,b)) Out[305]: True In [306]: %timeit original_approach(a,b) 1000 loops, best of 3: 746 µs per loop In [307]: %timeit vectorized_approach(a,b) 10000 loops, best of 3: 76.9 µs per loop
请留意,对于极其巨大的输入数组情况下,如果b
内容是这样的小部分,因为累积性的操作,初始数量b_rev_cumprod
可能会出来作为zeros
导致NaNs
在那些最初的地方.