我仍然没有得到n.01的含义以及为什么有必要.
从这里和nltk的来源显示结果是"WORD.PART-OF-SPEECH.SENSE-NUMBER"
引用来源:
Create a Lemma from a ". . . " string where: is the morphological stem identifying the synset is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB is the sense number, counting from 0. is the morphological form of interest
n意味着名词,我也建议阅读wordnet数据集.
2.有一种方法可以直观地显示2个术语之间的计算路径吗?
请查看关于相似性部分的nltk wordnet文档.你有几种路径算法选择(你可以尝试混合几种).
来自nltk docs的几个例子:
from nltk.corpus import wordnet as wn dog = wn.synset('dog.n.01') cat = wn.synset('cat.n.01') print(dog.path_similarity(cat)) print(dog.lch_similarity(cat)) print(dog.wup_similarity(cat))
对于可视化,您可以构建一个距离矩阵M[i,j]
,其中:
M[i,j] = word_similarity(i, j)
并使用以下stackoverflow答案绘制可视化.
3.我可以使用哪种其他nltk语义指标?
如上所述,有几种方法可以计算单词的相似性.我也建议调查gensim.我使用word2vec实现单词相似性,它对我很有用.
如果您需要任何帮助选择算法,请提供有关您所面临问题的更多信息.
有关sense number
词义的更多信息,请访问:
WordNet中的感觉通常从大多数到最不常用的顺序排序,最常见的编号为1 ......
问题是"狗"含糊不清,你必须为它选择正确的含义.
您可以选择第一种感觉作为天真的方法,或者根据您的应用或研究找到您自己的算法来选择正确的含义.
从wordnet 获取所有可用的定义(在wordnet docs上称为synset),你可以简单地调用wn.synsets(word)
.
我鼓励您深入研究每个定义中这些synset中包含的元数据.
下面的代码显示了一个简单的示例来获取此元数据并很好地打印它.
from nltk.corpus import wordnet as wn dog_synsets = wn.synsets('dog') for i, syn in enumerate(dog_synsets): print('%d. %s' % (i, syn.name())) print('alternative names (lemmas): "%s"' % '", "'.join(syn.lemma_names())) print('definition: "%s"' % syn.definition()) if syn.examples(): print('example usage: "%s"' % '", "'.join(syn.examples())) print('\n')
代码输出:
0. dog.n.01 alternative names (lemmas): "dog", "domestic_dog", "Canis_familiaris" definition: "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds" example usage: "the dog barked all night" 1. frump.n.01 alternative names (lemmas): "frump", "dog" definition: "a dull unattractive unpleasant girl or woman" example usage: "she got a reputation as a frump", "she's a real dog" 2. dog.n.03 alternative names (lemmas): "dog" definition: "informal term for a man" example usage: "you lucky dog" 3. cad.n.01 alternative names (lemmas): "cad", "bounder", "blackguard", "dog", "hound", "heel" definition: "someone who is morally reprehensible" example usage: "you dirty dog" 4. frank.n.02 alternative names (lemmas): "frank", "frankfurter", "hotdog", "hot_dog", "dog", "wiener", "wienerwurst", "weenie" definition: "a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll" 5. pawl.n.01 alternative names (lemmas): "pawl", "detent", "click", "dog" definition: "a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward" 6. andiron.n.01 alternative names (lemmas): "andiron", "firedog", "dog", "dog-iron" definition: "metal supports for logs in a fireplace" example usage: "the andirons were too hot to touch" 7. chase.v.01 alternative names (lemmas): "chase", "chase_after", "trail", "tail", "tag", "give_chase", "dog", "go_after", "track" definition: "go after with the intent to catch" example usage: "The policeman chased the mugger down the alley", "the dog chased the rabbit"