我有2个文件:
hyp.txt
It is a guide to action which ensures that the military always obeys the commands of the party he read the book because he was interested in world history
ref.txt
It is a guide to action that ensures that the military will forever heed Party commands he was interested in world history because he read the book
我有一个函数可以进行一些计算来比较文本的行,例如hyp.txt的第1行和ref.txt的第1行.
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref): """ :type list_of_tokenized_hyp: iter(iter(str)) :type list_of_tokenized_ref: iter(iter(str)) """ for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref): # do something with the iter(str) return score
并且此功能无法更改.然而,我可以操纵我提供的功能.所以目前我正在将文件输入到这样的函数中:
with open('hyp.txt', 'r') as hypfin, open('ref.txt', 'r') as reffin: hyp = [line.split() for line in hypfin] ref = [line.split() for line in reffin] scorer(hypfin, reffin)
但通过这样做,我已经将整个文件和拆分字符串加载到内存中,然后将其输入到内存中scorer()
.
知道scorer()
正在逐行处理文件,有没有办法在不改变scorer()
函数的情况下输入函数之前实现拆分字符串?
有没有办法喂养某种发电机呢?
我试过这个:
with open('hyp.txt', 'r') as hypfin, open('ref1.txt', 'r') as ref1fin, open('ref2.txt', 'r') as ref2fin: hyp = (h.split() for h in hypline) ref = (r.split() for r in hypline) scorer(hypfin, reffin)
但我不确定这是否h.split()
已经实现.如果它已经实现,为什么?如果没有,为什么?
如果我可以更改scorer()
功能,那么我可以在以下之后轻松添加此行for
:
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref): for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref): hypline = hypline.split() refline = refline.split() # do something with the iter(str) return score
但在我的情况下这是不可能的,因为我不能改变这个功能.