考虑以下脚本,我在其中测试两种方法,对通过以下方式获得的生成器执行某些计算itertools.tee
:
#!/usr/bin/env python3 from sys import argv from itertools import tee from multiprocessing import Process def my_generator(): for i in range(5): print(i) yield i def double(x): return 2 * x def compute_double_sum(iterable): s = sum(map(double, iterable)) print(s) def square(x): return x * x def compute_square_sum(iterable): s = sum(map(square, iterable)) print(s) g1, g2 = tee(my_generator(), 2) try: processing_type = argv[1] except IndexError: processing_type = "no_multi" if processing_type == "multi": p1 = Process(target=compute_double_sum, args=(g1,)) p2 = Process(target=compute_square_sum, args=(g2,)) print("p1 starts") p1.start() print("p2 starts") p2.start() p1.join() print("p1 finished") p2.join() print("p2 finished") else: compute_double_sum(g1) compute_square_sum(g2)
这是我在"正常"模式下运行脚本时获得的内容:
$ ./test_tee.py 0 1 2 3 4 20 30
这里采用并行模式:
$ ./test_tee.py multi p1 starts p2 starts 0 1 2 3 4 20 0 1 2 3 4 30 p1 finished p2 finished
初始生成器显然以某种方式"复制",并执行两次.
我想避免这种情况,因为在我的实际应用程序中,这似乎会导致我用于创建初始生成器的一个外部库中的错误(https://github.com/pysam-developers/pysam/issues/ 397),并且仍然能够在相同的生成值上并行进行计算.
有没有办法实现我想要的?