NLTK使用的实际例子

作者：罗文彬2502852027 | 2023-08-30 20:18

如何解决《NLTK使用的实际例子》经验，为你挑选了3个好方法。

我正在玩自然语言工具包(NLTK).

它的文档(Book和HOWTO)非常笨重,示例有时略有提升.

NLTK的使用/应用是否有任何好的但基本的例子？我正在考虑像Stream Hacker博客上的NTLK文章.

1> Mat..：

这是我自己的实际例子,以便其他人看到这个问题的好处(借口示例文本,这是我在维基百科上找到的第一件事):

import nltk
import pprint

tokenizer = None
tagger = None

def init_nltk():
    global tokenizer
    global tagger
    tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+|[^\w\s]+')
    tagger = nltk.UnigramTagger(nltk.corpus.brown.tagged_sents())

def tag(text):
    global tokenizer
    global tagger
    if not tokenizer:
        init_nltk()
    tokenized = tokenizer.tokenize(text)
    tagged = tagger.tag(tokenized)
    tagged.sort(lambda x,y:cmp(x[1],y[1]))
    return tagged

def main():
    text = """Mr Blobby is a fictional character who featured on Noel
    Edmonds' Saturday night entertainment show Noel's House Party,
    which was often a ratings winner in the 1990s. Mr Blobby also
    appeared on the Jamie Rose show of 1997. He was designed as an
    outrageously over the top parody of a one-dimensional, mute novelty
    character, which ironically made him distinctive, absurd and popular.
    He was a large pink humanoid, covered with yellow spots, sporting a
    permanent toothy grin and jiggling eyes. He communicated by saying
    the word "blobby" in an electronically-altered voice, expressing
    his moods through tone of voice and repetition.

    There was a Mrs. Blobby, seen briefly in the video, and sold as a
    doll.

    However Mr Blobby actually started out as part of the 'Gotcha'
    feature during the show's second series (originally called 'Gotcha
    Oscars' until the threat of legal action from the Academy of Motion
    Picture Arts and Sciences[citation needed]), in which celebrities
    were caught out in a Candid Camera style prank. Celebrities such as
    dancer Wayne Sleep and rugby union player Will Carling would be
    enticed to take part in a fictitious children's programme based around
    their profession. Mr Blobby would clumsily take part in the activity,
    knocking over the set, causing mayhem and saying "blobby blobby
    blobby", until finally when the prank was revealed, the Blobby
    costume would be opened - revealing Noel inside. This was all the more
    surprising for the "victim" as during rehearsals Blobby would be
    played by an actor wearing only the arms and legs of the costume and
    speaking in a normal manner.[citation needed]"""
    tagged = tag(text)    
    l = list(set(tagged))
    l.sort(lambda x,y:cmp(x[1],y[1]))
    pprint.pprint(l)

if __name__ == '__main__':
    main()

输出:

[('rugby', None),
 ('Oscars', None),
 ('1990s', None),
 ('",', None),
 ('Candid', None),
 ('"', None),
 ('blobby', None),
 ('Edmonds', None),
 ('Mr', None),
 ('outrageously', None),
 ('.[', None),
 ('toothy', None),
 ('Celebrities', None),
 ('Gotcha', None),
 (']),', None),
 ('Jamie', None),
 ('humanoid', None),
 ('Blobby', None),
 ('Carling', None),
 ('enticed', None),
 ('programme', None),
 ('1997', None),
 ('s', None),
 ("'", "'"),
 ('[', '('),
 ('(', '('),
 (']', ')'),
 (',', ','),
 ('.', '.'),
 ('all', 'ABN'),
 ('the', 'AT'),
 ('an', 'AT'),
 ('a', 'AT'),
 ('be', 'BE'),
 ('were', 'BED'),
 ('was', 'BEDZ'),
 ('is', 'BEZ'),
 ('and', 'CC'),
 ('one', 'CD'),
 ('until', 'CS'),
 ('as', 'CS'),
 ('This', 'DT'),
 ('There', 'EX'),
 ('of', 'IN'),
 ('inside', 'IN'),
 ('from', 'IN'),
 ('around', 'IN'),
 ('with', 'IN'),
 ('through', 'IN'),
 ('-', 'IN'),
 ('on', 'IN'),
 ('in', 'IN'),
 ('by', 'IN'),
 ('during', 'IN'),
 ('over', 'IN'),
 ('for', 'IN'),
 ('distinctive', 'JJ'),
 ('permanent', 'JJ'),
 ('mute', 'JJ'),
 ('popular', 'JJ'),
 ('such', 'JJ'),
 ('fictional', 'JJ'),
 ('yellow', 'JJ'),
 ('pink', 'JJ'),
 ('fictitious', 'JJ'),
 ('normal', 'JJ'),
 ('dimensional', 'JJ'),
 ('legal', 'JJ'),
 ('large', 'JJ'),
 ('surprising', 'JJ'),
 ('absurd', 'JJ'),
 ('Will', 'MD'),
 ('would', 'MD'),
 ('style', 'NN'),
 ('threat', 'NN'),
 ('novelty', 'NN'),
 ('union', 'NN'),
 ('prank', 'NN'),
 ('winner', 'NN'),
 ('parody', 'NN'),
 ('player', 'NN'),
 ('actor', 'NN'),
 ('character', 'NN'),
 ('victim', 'NN'),
 ('costume', 'NN'),
 ('action', 'NN'),
 ('activity', 'NN'),
 ('dancer', 'NN'),
 ('grin', 'NN'),
 ('doll', 'NN'),
 ('top', 'NN'),
 ('mayhem', 'NN'),
 ('citation', 'NN'),
 ('part', 'NN'),
 ('repetition', 'NN'),
 ('manner', 'NN'),
 ('tone', 'NN'),
 ('Picture', 'NN'),
 ('entertainment', 'NN'),
 ('night', 'NN'),
 ('series', 'NN'),
 ('voice', 'NN'),
 ('Mrs', 'NN'),
 ('video', 'NN'),
 ('Motion', 'NN'),
 ('profession', 'NN'),
 ('feature', 'NN'),
 ('word', 'NN'),
 ('Academy', 'NN-TL'),
 ('Camera', 'NN-TL'),
 ('Party', 'NN-TL'),
 ('House', 'NN-TL'),
 ('eyes', 'NNS'),
 ('spots', 'NNS'),
 ('rehearsals', 'NNS'),
 ('ratings', 'NNS'),
 ('arms', 'NNS'),
 ('celebrities', 'NNS'),
 ('children', 'NNS'),
 ('moods', 'NNS'),
 ('legs', 'NNS'),
 ('Sciences', 'NNS-TL'),
 ('Arts', 'NNS-TL'),
 ('Wayne', 'NP'),
 ('Rose', 'NP'),
 ('Noel', 'NP'),
 ('Saturday', 'NR'),
 ('second', 'OD'),
 ('his', 'PP$'),
 ('their', 'PP$'),
 ('him', 'PPO'),
 ('He', 'PPS'),
 ('more', 'QL'),
 ('However', 'RB'),
 ('actually', 'RB'),
 ('also', 'RB'),
 ('clumsily', 'RB'),
 ('originally', 'RB'),
 ('only', 'RB'),
 ('often', 'RB'),
 ('ironically', 'RB'),
 ('briefly', 'RB'),
 ('finally', 'RB'),
 ('electronically', 'RB-HL'),
 ('out', 'RP'),
 ('to', 'TO'),
 ('show', 'VB'),
 ('Sleep', 'VB'),
 ('take', 'VB'),
 ('opened', 'VBD'),
 ('played', 'VBD'),
 ('caught', 'VBD'),
 ('appeared', 'VBD'),
 ('revealed', 'VBD'),
 ('started', 'VBD'),
 ('saying', 'VBG'),
 ('causing', 'VBG'),
 ('expressing', 'VBG'),
 ('knocking', 'VBG'),
 ('wearing', 'VBG'),
 ('speaking', 'VBG'),
 ('sporting', 'VBG'),
 ('revealing', 'VBG'),
 ('jiggling', 'VBG'),
 ('sold', 'VBN'),
 ('called', 'VBN'),
 ('made', 'VBN'),
 ('altered', 'VBN'),
 ('based', 'VBN'),
 ('designed', 'VBN'),
 ('covered', 'VBN'),
 ('communicated', 'VBN'),
 ('needed', 'VBN'),
 ('seen', 'VBN'),
 ('set', 'VBN'),
 ('featured', 'VBN'),
 ('which', 'WDT'),
 ('who', 'WPS'),
 ('when', 'WRB')]

这是做什么的？你能加点描述吗？以及为什么使用全球,你可以直接使用它们

2> Pete Mancini..：

NLP通常非常有用,因此您可能希望将搜索范围扩展到文本分析的一般应用程序.我使用NLTK通过提取概念图生成文件分类法来帮助MOSS 2010.它运作得很好.文件以有用的方式开始聚类不需要很长时间.

通常情况下,要理解文本分析,您必须考虑您习惯思考的方式.例如,文本分析对于发现非常有用.但是,大多数人甚至不知道搜索和发现之间的区别.如果您阅读了这些主题,您可能会"发现"您希望将NLTK用于工作的方式.

另外,请考虑没有NLTK的文本文件的世界视图.你有一堆由空格和标点符号分隔的随机长度字符串.一些标点符号会改变它的使用方式,例如句点(也是缩写的小数点和后缀标记.)使用NLTK,您可以获得单词以及更多内容,从而获得词性.现在您可以处理内容了.使用NLTK发现文档中的概念和操作.使用NLTK来获取文档的"含义".在这种情况下的含义是指文档中的基本关系.

对NLTK感到好奇是一件好事.Text Analytics将在未来几年内大举突破.理解它的人将更适合更好地利用新机会.

3> Jacob..：

我是streamhacker.com的作者(感谢提及,我从这个特定问题得到了相当多的点击流量).具体你想做什么？NLTK有许多工具可用于执行各种操作,但在某些方面缺乏关于使用工具的清晰信息,以及如何最好地使用它们.它也面向学术问题,因此将教学实例转化为实际解决方案可能会很重要.

推荐阅读

程序员
在Java8函数样式中,如何将值映射到现有的键值对

如何解决《在Java8函数样式中,如何将值映射到现有的键值对》经验，为你挑选了1个好方法。 ... [详细]
程序员
列出仅在当前目录中超过x天的所有文件

如何解决《列出仅在当前目录中超过x天的所有文件》经验，为你挑选了1个好方法。 ... [详细]
程序员
重新启动后，Kafka主题不再存在

如何解决《重新启动后，Kafka主题不再存在》经验，为你挑选了1个好方法。 ... [详细]
程序员
有没有办法将现有的"特征"分配给史诗？

如何解决《有没有办法将现有的"特征"分配给史诗？》经验，为你挑选了1个好方法。 ... [详细]
程序员
BEM与SASS和:悬停

如何解决《BEM与SASS和:悬停》经验，为你挑选了2个好方法。 ... [详细]
程序员
如何使用FOSHttpCacheBundle和Varnish进行缓存标记？

如何解决《如何使用FOSHttpCacheBundle和Varnish进行缓存标记？》经验，为你挑选了1个好方法。 ... [详细]
程序员
tableview使用swift 2.0在ios开发中不显示单元分隔符行.xcode 7

如何解决《tableview使用swift2.0在ios开发中不显示单元分隔符行.xcode7》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在elixir的Ecto查询中使用"case-when"？

如何解决《如何在elixir的Ecto查询中使用"case-when"？》经验，为你挑选了1个好方法。 ... [详细]
程序员
使用未部署在Azure中的应用程序访问Azure Key Vault存储的密钥

如何解决《使用未部署在Azure中的应用程序访问AzureKeyVault存储的密钥》经验，为你挑选了1个好方法。 ... [详细]
程序员
并行计算:仅在每个线程中加载一次包

如何解决《并行计算:仅在每个线程中加载一次包》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在std :: map中创建新条目而不复制条目值 - 没有指针

如何解决《如何在std::map中创建新条目而不复制条目值-没有指针》经验，为你挑选了1个好方法。 ... [详细]
程序员
在Scala 2.11中进行模式匹配的穷举检查

如何解决《在Scala2.11中进行模式匹配的穷举检查》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何更改Rails 4中的按钮文本？

如何解决《如何更改Rails4中的按钮文本？》经验，为你挑选了1个好方法。 ... [详细]
程序员
"Objective-C生成的接口标题名称"中的错误

如何解决《"Objective-C生成的接口标题名称"中的错误》经验，为你挑选了1个好方法。 ... [详细]
程序员
列表中的对元素计数

如何解决《列表中的对元素计数》经验，为你挑选了1个好方法。 ... [详细]
程序员
(Array/List/Seq).groupBy是否维护组内的排序顺序？

如何解决《(Array/List/Seq).groupBy是否维护组内的排序顺序？》经验，为你挑选了1个好方法。 ... [详细]
程序员
ASP.NET 5 MVC 6中的web.config

如何解决《ASP.NET5MVC6中的web.config》经验，为你挑选了1个好方法。 ... [详细]
程序员
Java 9或更高版本中的预计泛型专业化,与List <int>:.remove()将如何工作？

如何解决《Java9或更高版本中的预计泛型专业化,与List<int>:.remove()将如何工作？》经验，为你挑选了1个好方法。 ... [详细]
程序员
将PDF文件系统加载到Ionic(Cordova)+ Android + pdf.js应用程序中

如何解决《将PDF文件系统加载到Ionic(Cordova)+Android+pdf.js应用程序中》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在C#6中初始化类的属性

如何解决《如何在C#6中初始化类的属性》经验，为你挑选了1个好方法。 ... [详细]

罗文彬2502852027

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章