我很抱歉这个可能很愚蠢的问题,但我现在正试着用几个小时来估算一组2d数据的密度.我们假设我的数据由数组给出:sample = np.random.uniform(0,1,size=(50,2))
.我只是想使用scipys scikit learn package来估算样本数组的密度(这当然是2d均匀密度),我正在尝试以下方法:
import numpy as np from sklearn.neighbors.kde import KernelDensity from matplotlib import pyplot as plt sp = 0.01 samples = np.random.uniform(0,1,size=(50,2)) # random samples x = y = np.linspace(0,1,100) X,Y = np.meshgrid(x,y) # creating grid of data , to evaluate estimated density on kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # creating density from samples kde.score_samples(X,Y) # I want to evaluate the estimated density on the X,Y grid
但最后一步总是产生错误: score_samples() takes 2 positional arguments but 3 were given
所以.score_samples可能不会将网格作为输入,但是没有针对2d案例的教程/文档,所以我不知道如何解决这个问题.如果有人可以提供帮助,那真的很棒.
查看物种分布的核密度估计示例,您必须将x,y数据打包在一起(训练数据和新的样本网格).
下面是一个简化sklearn API的函数.
from sklearn.neighbors import KernelDensity
def kde2D(x, y, bandwidth, xbins=100j, ybins=100j, **kwargs):
"""Build 2D kernel density estimate (KDE)."""
# create grid of sample locations (default: 100x100)
xx, yy = np.mgrid[x.min():x.max():xbins,
y.min():y.max():ybins]
xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
xy_train = np.vstack([y, x]).T
kde_skl = KernelDensity(bandwidth=bandwidth, **kwargs)
kde_skl.fit(xy_train)
# score_samples() returns the log-likelihood of the samples
z = np.exp(kde_skl.score_samples(xy_sample))
return xx, yy, np.reshape(z, xx.shape)
这为您提供了像散点图或pcolormesh图所需的xx,yy,zz.我已经在gaussian_kde函数的scipy页面中复制了这个例子.
import numpy as np import matplotlib.pyplot as plt m1 = np.random.normal(size=1000) m2 = np.random.normal(scale=0.5, size=1000) x, y = m1 + m2, m1 - m2 xx, yy, zz = kde2D(x, y, 1.0) plt.pcolormesh(xx, yy, zz) plt.scatter(x, y, s=2, facecolor='white')