13赞

从文本中提取关键短语(1-4字ngrams)

作者：农大军乐团_697 | 2023-06-19 20:22

如何解决《从文本中提取关键短语(1-4字ngrams)》经验，为你挑选了1个好方法。

从一段文本中提取关键短语的最佳方法是什么？我正在编写一个关键字提取工具:类似这样的东西.我找到了一些用于Python和Perl的库来提取n-gram,但是我在Node中写这个,所以我需要一个JavaScript解决方案.如果没有任何现有的JavaScript库,有人可以解释如何执行此操作,以便我自己编写吗？

1> Rob W..：

我喜欢这个想法,所以我已经实现了它:见下文(包括描述性评论).
预览:http://fiddle.jshell.net/WsKMx/

/*@author Rob W, created on 16-17 September 2011, on request for Stackoverflow (http://stackoverflow.com/q/7085454/938089)
 * Modified on 17 juli 2012, fixed IE bug by replacing [,] with [null]
 * This script will calculate words. For the simplicity and efficiency,
 * there's only one loop through a block of text.
 * A 100% accuracy requires much more computing power, which is usually unnecessary
 **/


var text = "A quick brown fox jumps over the lazy old bartender who said 'Hi!' as a response to the visitor who presumably assaulted the maid's brother, because he didn't pay his debts in time. In time in time does really mean in time. Too late is too early? Nonsense! 'Too late is too early' does not make any sense.";

var atLeast = 2;       // Show results with at least .. occurrences
var numWords = 5;      // Show statistics for one to .. words
var ignoreCase = true; // Case-sensitivity
var REallowedChars = /[^a-zA-Z'\-]+/g;
 // RE pattern to select valid characters. Invalid characters are replaced with a whitespace

var i, j, k, textlen, len, s;
// Prepare key hash
var keys = [null]; //"keys[0] = null", a word boundary with length zero is empty
var results = [];
numWords++; //for human logic, we start counting at 1 instead of 0
for (i=1; i<=numWords; i++) {
    keys.push({});
}

// Remove all irrelevant characters
text = text.replace(REallowedChars, " ").replace(/^\s+/,"").replace(/\s+$/,"");

// Create a hash
if (ignoreCase) text = text.toLowerCase();
text = text.split(/\s+/);
for (i=0, textlen=text.length; i= atLeast) results[k].push({"word":i, "count":key[i]});
    }
}

// Result parsing
var outputHTML = []; // Buffer data. This data is used to create a table using `.innerHTML`

var f_sortAscending = function(x,y) {return y.count - x.count;};
for (k=1; k'+k+' word'+(k==1?"":"s")+'');
    for (i=0,len=words.length; i" + words[i].word + "" +
           words[i].count + "" +
           Math.round(words[i].count/textlen*10000)/100 + "%");
           // textlen defined at the top
           // The relative occurence has a precision of 2 digits.
    }
}
outputHTML = '' +
              '' +
              '' +outputHTML.join("")+
               "Phrase Count Relativity
";
document.getElementById("RobW-sample").innerHTML = outputHTML;
/*
CSS:
#wordAnalysis td{padding:1px 3px 1px 5px}
.num-words-header{font-weight:bold;border-top:1px solid #000}

HTML:

*/

推荐阅读

程序员
Android - myLooper()vs getMainLooper()

如何解决《Android-myLooper()vsgetMainLooper()》经验，为你挑选了1个好方法。 ... [详细]
程序员
405不允许使用的方法Web API 2

如何解决《405不允许使用的方法WebAPI2》经验，为你挑选了0个好方法。 ... [详细]
程序员
Vue和Vue资源

如何解决《Vue和Vue资源》经验，为你挑选了2个好方法。 ... [详细]
程序员
具有多个根的编程语言

如何解决《具有多个根的编程语言》经验，为你挑选了1个好方法。 ... [详细]
程序员
特设承诺库

如何解决《特设承诺库》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在docker-compose中传递参数？

如何解决《如何在docker-compose中传递参数？》经验，为你挑选了3个好方法。 ... [详细]
程序员
将因子转换为原始数值

如何解决《将因子转换为原始数值》经验，为你挑选了0个好方法。 ... [详细]
程序员
重写System.out.print语句要容易一些

如何解决《重写System.out.print语句要容易一些》经验，为你挑选了3个好方法。 ... [详细]
程序员
为什么递归函数的输出为0？

如何解决《为什么递归函数的输出为0？》经验，为你挑选了1个好方法。 ... [详细]
程序员
JS图表库,允许部分着色y轴

如何解决《JS图表库,允许部分着色y轴》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在bitbucket wiki的markdown中逃避"@"

如何解决《如何在bitbucketwiki的markdown中逃避"@"》经验，为你挑选了1个好方法。 ... [详细]
程序员
在Leaflet LayerGroup中查找特定图层,其中图层是多边形

如何解决《在LeafletLayerGroup中查找特定图层,其中图层是多边形》经验，为你挑选了1个好方法。 ... [详细]
程序员
Redshift使用Grant或Canned ACL卸载

如何解决《Redshift使用Grant或CannedACL卸载》经验，为你挑选了0个好方法。 ... [详细]
程序员
单击Button时UITextField不会结束编辑(委托textFieldDidEndEditing)

如何解决《单击Button时UITextField不会结束编辑(委托textFieldDidEndEditing)》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何使用OpenCV 3.0 StereoSGBM和PCL生成一对立体图像的有效点云表示

如何解决《如何使用OpenCV3.0StereoSGBM和PCL生成一对立体图像的有效点云表示》经验，为你挑选了1个好方法。 ... [详细]
程序员
在后台运行时查看随机ngrok URL

如何解决《在后台运行时查看随机ngrokURL》经验，为你挑选了2个好方法。 ... [详细]
程序员
jQuery/javascript event.timestamp不起作用

如何解决《jQuery/javascriptevent.timestamp不起作用》经验，为你挑选了1个好方法。 ... [详细]
程序员
设置应用程序图标(Xcode 7)

如何解决《设置应用程序图标(Xcode7)》经验，为你挑选了1个好方法。 ... [详细]
程序员
使用本地帐户使用安全的ASP Net 5 web api

如何解决《使用本地帐户使用安全的ASPNet5webapi》经验，为你挑选了0个好方法。 ... [详细]
程序员
Symfony Config Treebuilder

如何解决《SymfonyConfigTreebuilder》经验，为你挑选了0个好方法。 ... [详细]

农大军乐团_697

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章