当前位置:  开发笔记 > 编程语言 > 正文

WordNet Python的单词相似度

如何解决《WordNetPython的单词相似度》经验,为你挑选了1个好方法。



1> ShmulikA..:

我仍然没有得到n.01的含义以及为什么有必要.

从这里和nltk的来源显示结果是"WORD.PART-OF-SPEECH.SENSE-NUMBER"

引用来源:

Create a Lemma from a "..." string where:
 is the morphological stem identifying the synset
 is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
 is the sense number, counting from 0.
 is the morphological form of interest

n意味着名词,我也建议阅读wordnet数据集.

2.有一种方法可以直观地显示2个术语之间的计算路径吗?

请查看关于相似性部分的nltk wordnet文档.你有几种路径算法选择(你可以尝试混合几种).

来自nltk docs的几个例子:

from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')

print(dog.path_similarity(cat))
print(dog.lch_similarity(cat))
print(dog.wup_similarity(cat))

对于可视化,您可以构建一个距离矩阵M[i,j],其中:

M[i,j] = word_similarity(i, j)

并使用以下stackoverflow答案绘制可视化.

3.我可以使用哪种其他nltk语义指标?

如上所述,有几种方法可以计算单词的相似性.我也建议调查gensim.我使用word2vec实现单词相似性,它对我很有用.

如果您需要任何帮助选择算法,请提供有关您所面临问题的更多信息.

更新:

有关sense number词义的更多信息,请访问:

WordNet中的感觉通常从大多数到最不常用的顺序排序,最常见的编号为1 ......

问题是"狗"含糊不清,你必须为它选择正确的含义.

您可以选择第一种感觉作为天真的方法,或者根据您的应用或研究找到您自己的算法来选择正确的含义.

从wordnet 获取所有可用的定义(在wordnet docs上称为synset),你可以简单地调用wn.synsets(word).

我鼓励您深入研究每个定义中这些synset中包含的元数据.

下面的代码显示了一个简单的示例来获取此元数据并很好地打印它.

from nltk.corpus import wordnet as wn

dog_synsets = wn.synsets('dog')

for i, syn in enumerate(dog_synsets):
    print('%d. %s' % (i, syn.name()))
    print('alternative names (lemmas): "%s"' % '", "'.join(syn.lemma_names()))
    print('definition: "%s"' % syn.definition())
    if syn.examples():
        print('example usage: "%s"' % '", "'.join(syn.examples()))
    print('\n')

代码输出:

0. dog.n.01
alternative names (lemmas): "dog", "domestic_dog", "Canis_familiaris"
definition: "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds"
example usage: "the dog barked all night"


1. frump.n.01
alternative names (lemmas): "frump", "dog"
definition: "a dull unattractive unpleasant girl or woman"
example usage: "she got a reputation as a frump", "she's a real dog"


2. dog.n.03
alternative names (lemmas): "dog"
definition: "informal term for a man"
example usage: "you lucky dog"


3. cad.n.01
alternative names (lemmas): "cad", "bounder", "blackguard", "dog", "hound", "heel"
definition: "someone who is morally reprehensible"
example usage: "you dirty dog"


4. frank.n.02
alternative names (lemmas): "frank", "frankfurter", "hotdog", "hot_dog", "dog", "wiener", "wienerwurst", "weenie"
definition: "a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll"


5. pawl.n.01
alternative names (lemmas): "pawl", "detent", "click", "dog"
definition: "a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward"


6. andiron.n.01
alternative names (lemmas): "andiron", "firedog", "dog", "dog-iron"
definition: "metal supports for logs in a fireplace"
example usage: "the andirons were too hot to touch"


7. chase.v.01
alternative names (lemmas): "chase", "chase_after", "trail", "tail", "tag", "give_chase", "dog", "go_after", "track"
definition: "go after with the intent to catch"
example usage: "The policeman chased the mugger down the alley", "the dog chased the rabbit"

推荐阅读
大大炮
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有