一个文件包含10000行,每行有一个条目。我需要处理文件,但要分批处理(小块)。
file = open("data.txt", "r") data = file.readlines() file.close() total_count = len(data) # equals to ~10000 or less max_batch = 50 # loop through 'data' with 50 entries at max in each loop. for i in range(total_count): batch = data[i:i+50] # first 50 entries result = process_data(batch) # some time consuming processing on 50 entries if result == True: # add to DB that 50 entries are processed successfully! else: return 0 # quit the operation # later start again from the point it failed. # say 51st or 2560th or 9950th entry
在这里做什么,以便下一个循环从第51到第100个条目中选择条目,依此类推?
如果某种方式的操作不成功并且在两次操作之间中断,则仅需要从失败的批次开始重新循环(基于数据库条目)。
我无法编写适当的逻辑。我应该保留两个清单吗?还是其他?
你近了
chunks = (total_count - 1) // 50 + 1 for i in range(chunks): batch = data[i*50:(i+1)*50]