我在一个文本文件中有一本书,我需要打印每个部分的第一段.我想如果我在\n \n和\n之间找到了一个文本,我就能找到答案.这是我的代码,但没有用.你能告诉我我哪里错了吗?
lines = [line.rstrip('\n') for line in open('G:\\aa.txt')] check = -1 first = 0 last = 0 for i in range(len(lines)): if lines[i] == "": if lines[i+1]=="": check = 1 first = i +2 if i+2< len(lines): if lines[i+2] == "" and check == 1: last = i+2 while (first < last): print(lines[first]) first = first + 1
另外我在stackoverflow中找到了一个代码,我也尝试了它,但它只打印了一个空数组.
f = open("G:\\aa.txt").readlines() flag=False for line in f: if line.startswith('\n\n'): flag=False if flag: print(line) elif line.strip().endswith('\n'): flag=True
我在下面分享了本书的一个示例部分.
一世
土地的一面
人类感兴趣的领域非常广泛,只是在我们的门外,它们一直在探索之中.它是动物智能领域.
在研究世界野生动物的各种兴趣中,没有一种超越对他们的思想,道德以及他们作为心理过程结果所表现的行为的研究.
II
野生动物的气质和个性
我在这里要做的是,找到大写的行,并将它们全部放在一个数组中.然后,使用索引方法,我将通过比较我创建的这个数组的这些元素的索引来找到每个部分的第一段和最后一段.
输出应该是这样的:
人类感兴趣的领域非常广泛,只是在我们的门外,它们一直在探索之中.它是动物智能领域.
我在这里要做的是,找到大写的行,并将它们全部放在一个数组中.然后,使用索引方法,我将通过比较我创建的这个数组的这些元素的索引来找到每个部分的第一段和最后一段.
如果要对这些部分进行分组,可以使用itertools.groupby
空行作为分隔符:
from itertools import groupby with open("in.txt") as f: for k, sec in groupby(f,key=lambda x: bool(x.strip())): if k: print(list(sec))
使用更多的itertools foo,我们可以使用大写标题作为分隔符获取部分:
from itertools import groupby, takewhile with open("in.txt") as f: grps = groupby(f,key=lambda x: x.isupper()) for k, sec in grps: # if we hit a title line if k: # pull all paragraphs v = next(grps)[1] # skip two empty lines after title next(v,""), next(v,"") # take all lines up to next empty line/second paragraph print(list(takewhile(lambda x: bool(x.strip()), v)))
哪个会给你:
['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n'] ['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.']
每个部分的开头都有一个全大写的标题,所以一旦我们知道有两个空行,那么第一个段落和模式重复.
将其分解为使用循环:
from itertools import groupby from itertools import groupby def parse_sec(bk): with open(bk) as f: grps = groupby(f, key=lambda x: bool(x.isupper())) for k, sec in grps: if k: print("First paragraph from section titled :{}".format(next(sec).rstrip())) v = next(grps)[1] next(v, ""),next(v,"") for line in v: if not line.strip(): break print(line)
为您的文字:
In [11]: cat -E in.txt THE LAY OF THE LAND$ $ $ There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$ $ Of all the kinds of interest attaching to the study of the world's wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$ $ $ WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$ $ $ What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.
美元符号是新线,产量是:
In [12]: parse_sec("in.txt") First paragraph from section titled :THE LAY OF THE LAND There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence. First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.