所以我在文本文件中有一个单词列表.我想对它们进行词形还原以删除具有相同含义但处于不同时态的词.喜欢尝试,试过等.当我这样做时,我不断收到类似TypeError的错误:不可用的类型:'list'
results=[] with open('/Users/xyz/Documents/something5.txt', 'r') as f: for line in f: results.append(line.strip().split()) lemma= WordNetLemmatizer() lem=[] for r in results: lem.append(lemma.lemmatize(r)) with open("lem.txt","w") as t: for item in lem: print>>t, item
如何将已经令牌的词语变形?
该方法WordNetLemmatizer.lemmatize
可能需要一个字符串,但您传递的是字符串列表.这给你TypeError
例外.
结果line.split()
是一个字符串列表,您将其作为列表附加到results
列表中.
你想用 results.extend(line.strip().split())
results = [] with open('/Users/xyz/Documents/something5.txt', 'r') as f: for line in f: results.extend(line.strip().split()) lemma = WordNetLemmatizer() lem = map(lemma.lemmatize, results) with open("lem.txt", "w") as t: for item in lem: print >> t, item
或没有中间结果列表重构
def words(fname): with open(fname, 'r') as document: for line in document: for word in line.strip().split(): yield word lemma = WordNetLemmatizer() lem = map(lemma.lemmatize, words('/Users/xyz/Documents/something5.txt'))