我有一个任意长度的列表,我需要将它分成相同大小的块并对其进行操作.有一些明显的方法可以做到这一点,比如保留一个计数器和两个列表,当第二个列表填满时,将它添加到第一个列表并清空下一轮数据的第二个列表,但这可能非常昂贵.
我想知道是否有人对任何长度的列表都有一个很好的解决方案,例如使用生成器.
我一直在寻找有用的东西,itertools
但我找不到任何明显有用的东西.但是可能会错过它.
相关问题:在块中迭代列表的最"pythonic"方法是什么?
这是一个产生你想要的块的生成器:
def chunks(lst, n): """Yield successive n-sized chunks from lst.""" for i in range(0, len(lst), n): yield lst[i:i + n]
import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]]
如果您使用的是Python 2,则应使用xrange()
而不是range()
:
def chunks(lst, n): """Yield successive n-sized chunks from lst.""" for i in xrange(0, len(lst), n): yield lst[i:i + n]
您也可以简单地使用列表理解而不是编写函数.Python 3:
[lst[i:i + n] for i in range(0, len(lst), n)]
Python 2版本:
[lst[i:i + n] for i in xrange(0, len(lst), n)]
如果你想要一些超级简单的事:
def chunks(l, n): n = max(1, n) return (l[i:i+n] for i in xrange(0, len(l), n))
range()
而不是xrange()
在Python 3.x的情况下使用
直接来自(旧的)Python文档(itertools的配方):
from itertools import izip, chain, repeat def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
当前版本,由JFSebastian建议:
#from itertools import izip_longest as zip_longest # for Python 2.x from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library) def grouper(n, iterable, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
我猜Guido的时间机器工作 - 工作 - 将工作 - 将工作 - 再次工作.
这些解决方案起作用是因为[iter(iterable)]*n
(或早期版本中的等价物)在列表中重复创建一个迭代器n
.izip_longest
然后有效地执行"每个"迭代器的循环; 因为这是相同的迭代器,所以每个这样的调用都会使它前进,从而导致每个这样的zip-roundrobin生成一个元组元组n
.
我知道这有点旧,但我不知道为什么没有人提到numpy.array_split
:
import numpy as np lst = range(50) np.array_split(lst, 5) # [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), # array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), # array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]), # array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]), # array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
我很惊讶,没有人想到使用iter
的双参数形式:
from itertools import islice def chunk(it, size): it = iter(it) return iter(lambda: tuple(islice(it, size)), ())
演示:
>>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
这适用于任何可迭代的并且懒惰地产生输出.它返回元组而不是迭代器,但我认为它有一定的优雅.它也不垫; 如果你想要填充,上面的一个简单的变化就足够了:
from itertools import islice, chain, repeat def chunk_pad(it, size, padval=None): it = chain(iter(it), repeat(padval)) return iter(lambda: tuple(islice(it, size)), (padval,) * size)
演示:
>>> list(chunk_pad(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk_pad(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
与izip_longest
基于解决方案的解决方案一样,上述内容始终如一.据我所知,没有一行或两行的itertools配方可选择填充功能.通过结合上述两种方法,这个方法非常接近:
_no_padding = object() def chunk(it, size, padval=_no_padding): if padval == _no_padding: it = iter(it) sentinel = () else: it = chain(iter(it), repeat(padval)) sentinel = (padval,) * size return iter(lambda: tuple(islice(it, size)), sentinel)
演示:
>>> list(chunk(range(14), 3)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)] >>> list(chunk(range(14), 3, None)) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)] >>> list(chunk(range(14), 3, 'a')) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
我相信这是提出可选填充的最短的chunker.
正如Tomasz Gandor所观察到的,如果两个填充块遇到一长串填充值,它们将意外停止.以下是以合理方式解决该问题的最终变体:
_no_padding = object() def chunk(it, size, padval=_no_padding): it = iter(it) chunker = iter(lambda: tuple(islice(it, size)), ()) if padval == _no_padding: yield from chunker else: for ch in chunker: yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
演示:
>>> list(chunk([1, 2, (), (), 5], 2)) [(1, 2), ((), ()), (5,)] >>> list(chunk([1, 2, None, None, 5], 2, None)) [(1, 2), (None, None), (5, None)]
这是一个可以处理任意迭代的生成器:
def split_seq(iterable, size): it = iter(iterable) item = list(itertools.islice(it, size)) while item: yield item item = list(itertools.islice(it, size))
例:
>>> import pprint >>> pprint.pprint(list(split_seq(xrange(75), 10))) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]]
def chunk(input, size): return map(None, *([iter(input)] * size))
简约而优雅
l = range(1, 1000) print [l[x:x+10] for x in xrange(0, len(l), 10)]
或者如果您愿意:
chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)] chunks(l, 10)
我在这个问题的副本中看到了最棒的Python-ish答案:
from itertools import zip_longest a = range(1, 16) i = iter(a) r = list(zip_longest(i, i, i)) >>> print(r) [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
你可以为任何n创建n元组.如果a = range(1, 15)
,那么结果将是:
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]
如果列表均匀分配,则可以替换zip_longest
为zip
,否则三元组(13, 14, None)
将丢失.上面使用了Python 3.对于Python 2,请使用izip_longest
.
这些答案中没有一个是大小均匀的块,它们最后都留下了一个小块,所以它们并没有完全平衡.如果你使用这些功能来分配工作,你已经内置了一个可能在其他人之前完成的前景,所以当其他人继续努力工作时,它会无所事事.
例如,当前的最佳答案以:
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74]]
我最后讨厌那个小矮人!
其他人,如此list(grouper(3, xrange(7)))
,chunk(xrange(7), 3)
都回归:[(0, 1, 2), (3, 4, 5), (6, None, None)]
.这None
只是填充,在我看来相当不优雅.它们不是均匀地分块迭代.
为什么我们不能更好地划分这些?
这里有一个平衡的解决方案,改编自我已经在生产中使用的功能(在Python 3注,以取代xrange
用range
):
def baskets_from(items, maxbaskets=25): baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range for i, item in enumerate(items): baskets[i % maxbaskets].append(item) return filter(None, baskets)
我创建了一个生成器,如果你把它放到一个列表中,它会做同样的事情:
def iter_baskets_from(items, maxbaskets=3): '''generates evenly balanced baskets from indexable iterable''' item_count = len(items) baskets = min(item_count, maxbaskets) for x_i in xrange(baskets): yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]
最后,因为我看到所有上述函数都以连续的顺序返回元素(如给出的那样):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None): ''' generates balanced baskets from iterable, contiguous contents provide item_count if providing a iterator that doesn't support len() ''' item_count = item_count or len(items) baskets = min(item_count, maxbaskets) items = iter(items) floor = item_count // baskets ceiling = floor + 1 stepdown = item_count % baskets for x_i in xrange(baskets): length = ceiling if x_i < stepdown else floor yield [items.next() for _ in xrange(length)]
测试它们:
print(baskets_from(xrange(6), 8)) print(list(iter_baskets_from(xrange(6), 8))) print(list(iter_baskets_contiguous(xrange(6), 8))) print(baskets_from(xrange(22), 8)) print(list(iter_baskets_from(xrange(22), 8))) print(list(iter_baskets_contiguous(xrange(22), 8))) print(baskets_from('ABCDEFG', 3)) print(list(iter_baskets_from('ABCDEFG', 3))) print(list(iter_baskets_contiguous('ABCDEFG', 3))) print(baskets_from(xrange(26), 5)) print(list(iter_baskets_from(xrange(26), 5))) print(list(iter_baskets_contiguous(xrange(26), 5)))
打印出来:
[[0], [1], [2], [3], [4], [5]] [[0], [1], [2], [3], [4], [5]] [[0], [1], [2], [3], [4], [5]] [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]] [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]] [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]] [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']] [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']] [['A', 'B', 'C'], ['D', 'E'], ['F', 'G']] [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]] [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]] [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]
请注意,连续生成器以与其他两个相同的长度模式提供块,但是这些项都是有序的,并且它们被均匀地划分为可以划分离散元素列表.
如果您知道列表大小:
def SplitList(mylist, chunk_size): return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
如果你不(迭代器):
def IterChunks(sequence, chunk_size): res = [] for item in sequence: res.append(item) if len(res) >= chunk_size: yield res res = [] if res: yield res # yield the last, incomplete, portion
在后一种情况下,如果您可以确定序列始终包含给定大小的整数个块(即没有不完整的最后一个块),则可以以更漂亮的方式对其进行重新表述.
例如,如果您的块大小为3,则可以执行以下操作:
zip(*[iterable[i::3] for i in range(3)])
来源:http: //code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
当我的块大小是固定数字我可以输入时,我会使用它,例如'3',并且永远不会改变.
该图尔茨库具有partition
此功能:
from toolz.itertoolz.core import partition list(partition(2, [1, 2, 3, 4])) [(1, 2), (3, 4)]
我喜欢tzot和JFSebastian提出的Python doc版本,但它有两个缺点:
它不是很明确
我通常不希望最后一个块中有填充值
我在我的代码中经常使用这个:
from itertools import islice def chunks(n, iterable): iterable = iter(iterable) while True: yield tuple(islice(iterable, n)) or iterable.next()
更新:懒人块版本:
from itertools import chain, islice def chunks(n, iterable): iterable = iter(iterable) while True: yield chain([next(iterable)], islice(iterable, n-1))
码:
def split_list(the_list, chunk_size): result_list = [] while the_list: result_list.append(the_list[:chunk_size]) the_list = the_list[chunk_size:] return result_list a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print split_list(a_list, 3)
结果:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
在这一点上,我认为我们需要一个递归发生器,以防万一......
在python 2中:
def chunks(li, n): if li == []: return yield li[:n] for e in chunks(li[n:], n): yield e
在python 3中:
def chunks(li, n): if li == []: return yield li[:n] yield from chunks(li[n:], n)
此外,在大量外星人入侵的情况下,装饰的递归发生器可能会变得方便:
def dec(gen): def new_gen(li, n): for e in gen(li, n): if e == []: return yield e return new_gen @dec def chunks(li, n): yield li[:n] for e in chunks(li[n:], n): yield e
我很好奇不同方法的表现,这里是:
在Python 3.5.1上测试
import time batch_size = 7 arr_len = 298937 #---------slice------------- print("\r\nslice") start = time.time() arr = [i for i in range(0, arr_len)] while True: if not arr: break tmp = arr[0:batch_size] arr = arr[batch_size:-1] print(time.time() - start) #-----------index----------- print("\r\nindex") arr = [i for i in range(0, arr_len)] start = time.time() for i in range(0, round(len(arr) / batch_size + 1)): tmp = arr[batch_size * i : batch_size * (i + 1)] print(time.time() - start) #----------batches 1------------ def batch(iterable, n=1): l = len(iterable) for ndx in range(0, l, n): yield iterable[ndx:min(ndx + n, l)] print("\r\nbatches 1") arr = [i for i in range(0, arr_len)] start = time.time() for x in batch(arr, batch_size): tmp = x print(time.time() - start) #----------batches 2------------ from itertools import islice, chain def batch(iterable, size): sourceiter = iter(iterable) while True: batchiter = islice(sourceiter, size) yield chain([next(batchiter)], batchiter) print("\r\nbatches 2") arr = [i for i in range(0, arr_len)] start = time.time() for x in batch(arr, batch_size): tmp = x print(time.time() - start) #---------chunks------------- def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l), n): yield l[i:i + n] print("\r\nchunks") arr = [i for i in range(0, arr_len)] start = time.time() for x in chunks(arr, batch_size): tmp = x print(time.time() - start) #-----------grouper----------- from itertools import zip_longest # for Python 3.x #from six.moves import zip_longest # for both (uses the six compat library) def grouper(iterable, n, padvalue=None): "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue) arr = [i for i in range(0, arr_len)] print("\r\ngrouper") start = time.time() for x in grouper(arr, batch_size): tmp = x print(time.time() - start)
结果:
slice 31.18285083770752 index 0.02184295654296875 batches 1 0.03503894805908203 batches 2 0.22681021690368652 chunks 0.019841909408569336 grouper 0.006506919860839844
[AA[i:i+SS] for i in range(len(AA))[::SS]]
AA是数组,SS是块大小.例如:
>>> AA=range(10,21);SS=3 >>> [AA[i:i+SS] for i in range(len(AA))[::SS]] [[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]] # or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
您也可以使用库的get_chunks
功能utilspie
:
>>> from utilspie import iterutils >>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(iterutils.get_chunks(a, 5)) [[1, 2, 3, 4, 5], [6, 7, 8, 9]]
你可以utilspie
通过pip 安装:
sudo pip install utilspie
免责声明:我是utilspie库的创建者.
嘿,一行版
In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step)) In [49]: chunk(range(1,100), 10) Out[49]: [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99]]
def split_seq(seq, num_pieces): start = 0 for i in xrange(num_pieces): stop = start + len(seq[i::num_pieces]) yield seq[start:stop] start = stop
用法:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] for seq in split_seq(seq, 3): print seq
不调用对大型列表有用的len():
def splitter(l, n): i = 0 chunk = l[:n] while chunk: yield chunk i += n chunk = l[i:i+n]
这是针对迭代的:
def isplitter(l, n): l = iter(l) chunk = list(islice(l, n)) while chunk: yield chunk chunk = list(islice(l, n))
以上的功能性风味:
def isplitter2(l, n): return takewhile(bool, (tuple(islice(start, n)) for start in repeat(iter(l))))
要么:
def chunks_gen_sentinel(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return iter(imap(tuple, continuous_slices).next,())
要么:
def chunks_gen_filter(n, seq): continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n)) return takewhile(bool,imap(tuple, continuous_slices))
另一个更明确的版本.
def chunkList(initialList, chunkSize): """ This function chunks a list into sub lists that have a length equals to chunkSize. Example: lst = [3, 4, 9, 7, 1, 1, 2, 3] print(chunkList(lst, 3)) returns [[3, 4, 9], [7, 1, 1], [2, 3]] """ finalList = [] for i in range(0, len(initialList), chunkSize): finalList.append(initialList[i:i+chunkSize]) return finalList
还有一个解决方案
def make_chunks(data, chunk_size): while data: chunk, data = data[:chunk_size], data[chunk_size:] yield chunk >>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2): ... print chunk ... [1, 2] [3, 4] [5, 6] [7] >>>
考虑使用matplotlib.cbook片段
例如:
import matplotlib.cbook as cbook segments = cbook.pieces(np.arange(20), 3) for s in segments: print s
def chunks(iterable,n): """assumes n is an integer>0 """ iterable=iter(iterable) while True: result=[] for i in range(n): try: a=next(iterable) except StopIteration: break else: result.append(a) if result: yield result else: break g1=(i*i for i in range(10)) g2=chunks(g1,3) print g2 '' print list(g2) '[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
看到这个参考
>>> orange = range(1, 1001) >>> otuples = list( zip(*[iter(orange)]*10)) >>> print(otuples) [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)] >>> olist = [list(i) for i in otuples] >>> print(olist) [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]] >>>
Python3
a = [1, 2, 3, 4, 5, 6, 7, 8, 9] CHUNK = 4 [a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
在这一点上,我认为我们需要强制性的匿名递归功能.
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args))) chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
既然大家都在谈论迭代器.boltons
有完美的方法,称为iterutils.chunked_iter
.
from boltons import iterutils list(iterutils.chunked_iter(list(range(50)), 11))
输出:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], [44, 45, 46, 47, 48, 49]]
但是如果你不想怜悯记忆,你可以使用旧方式并将其全部存储list
起来iterutils.chunked
.
以下是其他方法的列表:
给定
import itertools as it import collections as ct import more_itertools as mit iterable = range(11) n = 3
码
标准图书馆
list(it.zip_longest(*[iter(iterable)] * n)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {} for i, x in enumerate(iterable): d.setdefault(i//n, []).append(x) list(d.values()) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list) for i, x in enumerate(iterable): dd[i//n].append(x) list(dd.values()) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
more_itertools
+
list(mit.chunked(iterable, n)) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]] list(mit.sliced(iterable, n)) # [range(0, 3), range(3, 6), range(6, 9), range(9, 11)] list(mit.grouper(n, iterable)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)] list(mit.windowed(iterable, len(iterable)//n, step=n)) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
参考文献
zip_longest
(相关职位,相关职位)
setdefault
(排序结果需要Python 3.6+)
collections.defaultdict
(排序结果需要Python 3.6+)
more_itertools.chunked
(相关发布)
more_itertools.sliced
more_itertools.grouper
(相关职位)
more_itertools.windowed
(另请参见stagger
,zip_offset
)
+一个实现itertools配方等的第三方库。> pip install more_itertools