我的句子如下:
I want to ____ the car because it is cheap.
我想使用NLP模型来预测丢失的单词。我应该使用哪种NLP模型?谢谢。
试试看:https : //github.com/huggingface/pytorch-pretrained-BERT
首先,您必须正确设置
pip install -U pytorch-pretrained-bert
然后,您可以使用BERT算法中的“屏蔽语言模型”,例如
import torch from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM # OPTIONAL: if you want to have more information on what's happening, activate the logger as follows import logging logging.basicConfig(level=logging.INFO) # Load pre-trained model tokenizer (vocabulary) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text = '[CLS] I want to [MASK] the car because it is cheap . [SEP]' tokenized_text = tokenizer.tokenize(text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) # Create the segments tensors. segments_ids = [0] * len(tokenized_text) # Convert inputs to PyTorch tensors tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) # Load pre-trained model (weights) model = BertForMaskedLM.from_pretrained('bert-base-uncased') model.eval() # Predict all tokens with torch.no_grad(): predictions = model(tokens_tensor, segments_tensors) predicted_index = torch.argmax(predictions[0, masked_index]).item() predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0] print(predicted_token)
[出]:
buy在长
要真正理解为什么你需要的[CLS]
,[MASK]
和段张量,请仔细阅读本文,https://arxiv.org/abs/1810.04805
如果您很懒惰,可以阅读来自Lilian Weng的这篇不错的博文,https: //lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html
除BERT以外,还有许多其他模型可以执行填补空白的任务。请查看pytorch-pretrained-BERT
存储库中的其他模型,但更重要的是,应更深入地研究“语言建模”的任务,即根据历史预测下一个单词的任务。