是否可以拆分文件?例如,你有一个巨大的wordlist,我想拆分它,使它成为多个文件.这怎么可能?
这个文件按换行分割文件并将其写回.您可以轻松更改分隔符.如果您的输入文件中没有多个splitLen行(本例中为20),这也可以处理不均匀的数量.
splitLen = 20 # 20 lines per file outputBase = 'output' # output.1.txt, output.2.txt, etc. # This is shorthand and not friendly with memory # on very large files (Sean Cavanagh), but it works. input = open('input.txt', 'r').read().split('\n') at = 1 for lines in range(0, len(input), splitLen): # First, get the list slice outputData = input[lines:lines+splitLen] # Now open the output file, join the new slice with newlines # and write it out. Then close the file. output = open(outputBase + str(at) + '.txt', 'w') output.write('\n'.join(outputData)) output.close() # Increment the counter at += 1
sli的一个更好的循环示例,而不是占用内存:
splitLen = 20 # 20 lines per file outputBase = 'output' # output.1.txt, output.2.txt, etc. input = open('input.txt', 'r') count = 0 at = 0 dest = None for line in input: if count % splitLen == 0: if dest: dest.close() dest = open(outputBase + str(at) + '.txt', 'w') at += 1 dest.write(line) count += 1
将二进制文件拆分为.000,.001等章节的解决方案:
FILE = 'scons-conversion.7z' MAX = 500*1024*1024 # 500Mb - max chapter size BUF = 50*1024*1024*1024 # 50GB - memory buffer size chapters = 0 uglybuf = '' with open(FILE, 'rb') as src: while True: tgt = open(FILE + '.%03d' % chapters, 'wb') written = 0 while written < MAX: if len(uglybuf) > 0: tgt.write(uglybuf) tgt.write(src.read(min(BUF, MAX - written))) written += min(BUF, MAX - written) uglybuf = src.read(1) if len(uglybuf) == 0: break tgt.close() if len(uglybuf) == 0: break chapters += 1