我正面临着这个属性错误,如果它们出现在推文中,我就会陷入如何处理浮点值.流媒体推文必须更低,并且标记化,所以我使用了分割功能.
有人可以帮我处理它,任何解决方法或解决方案..?
这是我犯的错误 ....
AttributeError Traceback (most recent call last)in () 1 stop_words = [] ----> 2 negfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'neg') for f in l] 3 posfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'pos') for f in p] 4 5 trainfeats = negfeats+ posfeats AttributeError: 'float' object has no attribute 'lower'
这是我的代码
p_test = pd.read_csv('TrainSA.csv') stop_words = [ ] def word_feats(words): return dict([(word, True) for word in words]) l = [ ] for f in range(len(p_test)): if p_test.Sentiment[f] == 0: l.append(f) p = [ ] for f in range(len(p_test)): if p_test.Sentiment[f] == 1: p.append(f) negfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'neg') for f in l] posfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'pos') for f in p] trainfeats = negfeats+ posfeats print len(trainfeats) import random random.shuffle(trainfeats) print(len(trainfeats)) p_train = pd.read_csv('TrainSA.csv') l_t = [] for f in range(len(p_train)): if p_train.Sentiment[f] == 0: l_t.append(f) p_t = [] for f in range(len(p_train)): if p_train.Sentiment[f] == 1: p_t.append(f) print len(l_t) print len(p_t)
我尝试了很多方法,但仍然无法让它们使用较低和分离功能.
我觉得你的问题在pd.read_csv('TrainSA.csv')函数中有根源.虽然你没有发布这个例程但我认为它是Pandas read_csv.此例程智能地将输入转换为python数据类型.但是,这意味着在您的情况下,某些值可以转换为浮点数.您可以通过指定每列所期望的数据类型来阻止此智能(?)行为.
谢谢@Dick Kniep ...是的,这是熊猫csv读者.你的建议工作.通过指定字段数据类型,以下代码为我工作...
p_test = pd.read_csv('TrainSA.csv') p_test.SentimentText=p_test.SentimentText.astype(str)