我一直在wikigold.conll NER数据集上运行这个LSTM教程
training_data
包含序列和标签的元组列表,例如:
training_data = [ ("They also have a song called \" wake up \"".split(), ["O", "O", "O", "O", "O", "O", "I-MISC", "I-MISC", "I-MISC", "I-MISC"]), ("Major General John C. Scheidt Jr.".split(), ["O", "O", "I-PER", "I-PER", "I-PER"]) ]
我写下了这个功能
def predict(indices): """Gets a list of indices of training_data, and returns a list of predicted lists of tags""" for index in indicies: inputs = prepare_sequence(training_data[index][0], word_to_ix) tag_scores = model(inputs) values, target = torch.max(tag_scores, 1) yield target
通过这种方式,我可以获得训练数据中特定指标的预测标签.
但是,如何评估所有训练数据的准确度分数.
准确性是,所有句子中正确分类的单词数量除以单词计数.
y_pred = list(predict([s for s, t in training_data])) y_true = [t for s, t in training_data] c=0 s=0 for i in range(len(training_data)): n = len(y_true[i]) #super ugly and ineffiicient s+=(sum(sum(list(y_true[i].view(-1, n) == y_pred[i].view(-1, n).data)))) c+=n print ('Training accuracy:{a}'.format(a=float(s)/c))
PS:我一直试图使用sklearn的accuracy_score失败