下图是使用露天R套件绘制的:
我知道matplotlib有这个plt.matshow
功能,
但它不能同时清楚地显示变量之间的关系.
这是我早期的工作:
df是一个带有7个变量的pandas数据框,如下所示:
我不知道如何将.csv
文件附加到StackOverflow.
使用plt.matshow(df.corr(),cmap = plt.cm.Greens)
,图中显示如下:
第二个数字不能像第一个数字一样清楚地表示变量的相关关系.
我在这里将csv文件上传到Google文档.
我不知道任何现有的Python库可以执行这些"椭圆图",但使用以下方法实现起来并不是特别困难matplotlib.collections.EllipseCollection
:
import numpy as np import pandas as pd from matplotlib import pyplot as plt from matplotlib.collections import EllipseCollection def plot_corr_ellipses(data, ax=None, **kwargs): M = np.array(data) if not M.ndim == 2: raise ValueError('data must be a 2D array') if ax is None: fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'}) ax.set_xlim(-0.5, M.shape[1] - 0.5) ax.set_ylim(-0.5, M.shape[0] - 0.5) # xy locations of each ellipse center xy = np.indices(M.shape)[::-1].reshape(2, -1).T # set the relative sizes of the major/minor axes according to the strength of # the positive/negative correlation w = np.ones_like(M).ravel() h = 1 - np.abs(M).ravel() a = 45 * np.sign(M).ravel() ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy, transOffset=ax.transData, array=M.ravel(), **kwargs) ax.add_collection(ec) # if data is a DataFrame, use the row/column names as tick labels if isinstance(data, pd.DataFrame): ax.set_xticks(np.arange(M.shape[1])) ax.set_xticklabels(data.columns, rotation=90) ax.set_yticks(np.arange(M.shape[0])) ax.set_yticklabels(data.index) return ec
例如,使用您的数据:
data = df.corr() fig, ax = plt.subplots(1, 1) m = plot_corr_ellipses(data, ax=ax, cmap='Greens') cb = fig.colorbar(m) cb.set_label('Correlation coefficient') ax.margins(0.1)
负相关可以绘制为具有相反方向的椭圆:
fig2, ax2 = plt.subplots(1, 1) data2 = np.linspace(-1, 1, 9).reshape(3, 3) m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1]) cb2 = fig2.colorbar(m2) ax2.margins(0.3)