19赞

Seaborn计数图,每组标准化y轴

作者：手机用户2402851155 | 2023-09-08 13:25

如何解决《Seaborn计数图,每组标准化y轴》经验，为你挑选了2个好方法。

我想知道是否可以创建Seaborn计数图,但是不是y轴上的实际计数,而是显示其组内的相对频率(百分比)(如hue参数所指定).

我用以下方法解决了这个问题,但我无法想象这是最简单的方法:

# Plot percentage of occupation per income class
grouped = df.groupby(['income'], sort=False)
occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)

occupation_data = [
    {'occupation': occupation, 'income': income, 'percentage': percentage*100} for 
    (income, occupation), percentage in dict(occupation_counts).items()
]

df_occupation = pd.DataFrame(occupation_data)

p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation)
_ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

结果:

含有seaborn的百分比图

我正在使用来自UCI机器学习库的众所周知的成人数据集.pandas数据框的创建方式如下:

# Read the adult dataset
df = pd.read_csv(
    "data/adult.data",
    engine='c',
    lineterminator='\n',

    names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',
           'marital_status', 'occupation', 'relationship', 'race', 'sex',
           'capital_gain', 'capital_loss', 'hours_per_week',
           'native_country', 'income'],
    header=None,
    skipinitialspace=True,
    na_values="?"
)

这个问题有点相关,但没有使用hue参数.在我的情况下,我不能只改变y轴上的标签,因为条的高度必须取决于组.

1> Pietro Batti..：

我可能会感到困惑.输出和输出之间的差异

occupation_counts = (df.groupby(['income'])['occupation']
                     .value_counts(normalize=True)
                     .rename('percentage')
                     .mul(100)
                     .reset_index()
                     .sort_values('occupation'))
p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts)
_ = plt.setp(p.get_xticklabels(), rotation=90)  # Rotate labels

在我看来,只是列的顺序.

在此输入图像描述

你似乎很关心这一点,因为你通过了sort=False.但是,在您的代码中,顺序是唯一偶然确定的(并且迭代字典的顺序甚至会随着Python 3.5的运行而变化).

2> 小智..：

令我震惊的是，Seaborn并没有提供类似的功能。

仍然可以很容易地调整源代码以获取所需的内容。以下代码具有功能“ percentageplot（x，hue，data）”，其功能与sns.countplot相同，但是规范了每组的每个条形（即，将每个绿色条形的值除以所有绿色条形的总和）

实际上，它变成了这个（很难解释，因为苹果与Android的N值不同）： sns.countplot 变成了这个（经过规范，以便条形图反映了苹果与Android相对于苹果的总数所占的比例）：百分比图

希望这可以帮助！！

from seaborn.categorical import _CategoricalPlotter, remove_na
import matplotlib as mpl

class _CategoricalStatPlotter(_CategoricalPlotter):

    @property
    def nested_width(self):
        """A float with the width of plot elements when hue nesting is used."""
        return self.width / len(self.hue_names)

    def estimate_statistic(self, estimator, ci, n_boot):

        if self.hue_names is None:
            statistic = []
            confint = []
        else:
            statistic = [[] for _ in self.plot_data]
            confint = [[] for _ in self.plot_data]

        for i, group_data in enumerate(self.plot_data):
            # Option 1: we have a single layer of grouping
            # --------------------------------------------

            if self.plot_hues is None:

                if self.plot_units is None:
                    stat_data = remove_na(group_data)
                    unit_data = None
                else:
                    unit_data = self.plot_units[i]
                    have = pd.notnull(np.c_[group_data, unit_data]).all(axis=1)
                    stat_data = group_data[have]
                    unit_data = unit_data[have]

                # Estimate a statistic from the vector of data
                if not stat_data.size:
                    statistic.append(np.nan)
                else:
                    statistic.append(estimator(stat_data, len(np.concatenate(self.plot_data))))

                # Get a confidence interval for this estimate
                if ci is not None:

                    if stat_data.size < 2:
                        confint.append([np.nan, np.nan])
                        continue

                    boots = bootstrap(stat_data, func=estimator,
                                      n_boot=n_boot,
                                      units=unit_data)
                    confint.append(utils.ci(boots, ci))

            # Option 2: we are grouping by a hue layer
            # ----------------------------------------

            else:
                for j, hue_level in enumerate(self.hue_names):
                    if not self.plot_hues[i].size:
                        statistic[i].append(np.nan)
                        if ci is not None:
                            confint[i].append((np.nan, np.nan))
                        continue

                    hue_mask = self.plot_hues[i] == hue_level
                    group_total_n = (np.concatenate(self.plot_hues) == hue_level).sum()
                    if self.plot_units is None:
                        stat_data = remove_na(group_data[hue_mask])
                        unit_data = None
                    else:
                        group_units = self.plot_units[i]
                        have = pd.notnull(
                            np.c_[group_data, group_units]
                            ).all(axis=1)
                        stat_data = group_data[hue_mask & have]
                        unit_data = group_units[hue_mask & have]

                    # Estimate a statistic from the vector of data
                    if not stat_data.size:
                        statistic[i].append(np.nan)
                    else:
                        statistic[i].append(estimator(stat_data, group_total_n))

                    # Get a confidence interval for this estimate
                    if ci is not None:

                        if stat_data.size < 2:
                            confint[i].append([np.nan, np.nan])
                            continue

                        boots = bootstrap(stat_data, func=estimator,
                                          n_boot=n_boot,
                                          units=unit_data)
                        confint[i].append(utils.ci(boots, ci))

        # Save the resulting values for plotting
        self.statistic = np.array(statistic)
        self.confint = np.array(confint)

        # Rename the value label to reflect the estimation
        if self.value_label is not None:
            self.value_label = "{}({})".format(estimator.__name__,
                                               self.value_label)

    def draw_confints(self, ax, at_group, confint, colors,
                      errwidth=None, capsize=None, **kws):

        if errwidth is not None:
            kws.setdefault("lw", errwidth)
        else:
            kws.setdefault("lw", mpl.rcParams["lines.linewidth"] * 1.8)

        for at, (ci_low, ci_high), color in zip(at_group,
                                                confint,
                                                colors):
            if self.orient == "v":
                ax.plot([at, at], [ci_low, ci_high], color=color, **kws)
                if capsize is not None:
                    ax.plot([at - capsize / 2, at + capsize / 2],
                            [ci_low, ci_low], color=color, **kws)
                    ax.plot([at - capsize / 2, at + capsize / 2],
                            [ci_high, ci_high], color=color, **kws)
            else:
                ax.plot([ci_low, ci_high], [at, at], color=color, **kws)
                if capsize is not None:
                    ax.plot([ci_low, ci_low],
                            [at - capsize / 2, at + capsize / 2],
                            color=color, **kws)
                    ax.plot([ci_high, ci_high],
                            [at - capsize / 2, at + capsize / 2],
                            color=color, **kws)

class _BarPlotter(_CategoricalStatPlotter):
    """Show point estimates and confidence intervals with bars."""

    def __init__(self, x, y, hue, data, order, hue_order,
                 estimator, ci, n_boot, units,
                 orient, color, palette, saturation, errcolor, errwidth=None,
                 capsize=None):
        """Initialize the plotter."""
        self.establish_variables(x, y, hue, data, orient,
                                 order, hue_order, units)
        self.establish_colors(color, palette, saturation)
        self.estimate_statistic(estimator, ci, n_boot)

        self.errcolor = errcolor
        self.errwidth = errwidth
        self.capsize = capsize

    def draw_bars(self, ax, kws):
        """Draw the bars onto `ax`."""
        # Get the right matplotlib function depending on the orientation
        barfunc = ax.bar if self.orient == "v" else ax.barh
        barpos = np.arange(len(self.statistic))

        if self.plot_hues is None:

            # Draw the bars
            barfunc(barpos, self.statistic, self.width,
                    color=self.colors, align="center", **kws)

            # Draw the confidence intervals
            errcolors = [self.errcolor] * len(barpos)
            self.draw_confints(ax,
                               barpos,
                               self.confint,
                               errcolors,
                               self.errwidth,
                               self.capsize)

        else:

            for j, hue_level in enumerate(self.hue_names):

                # Draw the bars
                offpos = barpos + self.hue_offsets[j]
                barfunc(offpos, self.statistic[:, j], self.nested_width,
                        color=self.colors[j], align="center",
                        label=hue_level, **kws)

                # Draw the confidence intervals
                if self.confint.size:
                    confint = self.confint[:, j]
                    errcolors = [self.errcolor] * len(offpos)
                    self.draw_confints(ax,
                                       offpos,
                                       confint,
                                       errcolors,
                                       self.errwidth,
                                       self.capsize)

    def plot(self, ax, bar_kws):
        """Make the plot."""
        self.draw_bars(ax, bar_kws)
        self.annotate_axes(ax)
        if self.orient == "h":
            ax.invert_yaxis()

def percentageplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
              orient=None, color=None, palette=None, saturation=.75,
              ax=None, **kwargs):

    # Estimator calculates required statistic (proportion)        
    estimator = lambda x, y: (float(len(x))/y)*100 
    ci = None
    n_boot = 0
    units = None
    errcolor = None

    if x is None and y is not None:
        orient = "h"
        x = y
    elif y is None and x is not None:
        orient = "v"
        y = x
    elif x is not None and y is not None:
        raise TypeError("Cannot pass values for both `x` and `y`")
    else:
        raise TypeError("Must pass values for either `x` or `y`")

    plotter = _BarPlotter(x, y, hue, data, order, hue_order,
                          estimator, ci, n_boot, units,
                          orient, color, palette, saturation,
                          errcolor)

    plotter.value_label = "Percentage"

    if ax is None:
        ax = plt.gca()

    plotter.plot(ax, kwargs)
    return ax

推荐阅读

程序员
榆树 - 结合和分类多种类型

如何解决《榆树-结合和分类多种类型》经验，为你挑选了1个好方法。 ... [详细]
程序员
无法`pip install -r requirements.txt`

如何解决《无法`pipinstall-rrequirements.txt`》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在lex中创建没有特定字母组的正则表达式

如何解决《如何在lex中创建没有特定字母组的正则表达式》经验，为你挑选了0个好方法。 ... [详细]
程序员
是否可以使用大括号{}来细分Java代码？

如何解决《是否可以使用大括号{}来细分Java代码？》经验，为你挑选了1个好方法。 ... [详细]
程序员
为什么Scala编译器失败,"包中的对象SparkConf无法在org.apache.spark包中访问"？

如何解决《为什么Scala编译器失败,"包中的对象SparkConf无法在org.apache.spark包中访问"？》经验，为你挑选了1个好方法。 ... [详细]
程序员
无法手动关闭matplotlib绘图窗口

如何解决《无法手动关闭matplotlib绘图窗口》经验，为你挑选了2个好方法。 ... [详细]
程序员
RxJS比较最后并发出

如何解决《RxJS比较最后并发出》经验，为你挑选了1个好方法。 ... [详细]
程序员
ios - 动态编辑3d触摸快捷方式列表

如何解决《ios-动态编辑3d触摸快捷方式列表》经验，为你挑选了1个好方法。 ... [详细]
程序员
BrowserLink MVC 6 - 不工作 - 没有注入额外代码

如何解决《BrowserLinkMVC6-不工作-没有注入额外代码》经验，为你挑选了0个好方法。 ... [详细]
程序员
Golang程序挂起而没有完成执行

如何解决《Golang程序挂起而没有完成执行》经验，为你挑选了1个好方法。 ... [详细]
程序员
在概念定义中,是否允许在需求表达式之外的替换失败？

如何解决《在概念定义中,是否允许在需求表达式之外的替换失败？》经验，为你挑选了0个好方法。 ... [详细]
程序员
通过ParseForm()检索表单选项id

如何解决《通过ParseForm()检索表单选项id》经验，为你挑选了1个好方法。 ... [详细]
程序员
Symfony2无法加载类型EntityType

如何解决《Symfony2无法加载类型EntityType》经验，为你挑选了2个好方法。 ... [详细]
程序员
printf bash - 在标记包围的现有行的中间打印文本

如何解决《printfbash-在标记包围的现有行的中间打印文本》经验，为你挑选了1个好方法。 ... [详细]
程序员
如果450KB base64编码,数据的文件大小是多少？

如何解决《如果450KBbase64编码,数据的文件大小是多少？》经验，为你挑选了1个好方法。 ... [详细]
程序员
Rspec allow_any_instance_of返回实例ID

如何解决《Rspecallow_any_instance_of返回实例ID》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何使用数组在div上按顺序获取颜色

如何解决《如何使用数组在div上按顺序获取颜色》经验，为你挑选了1个好方法。 ... [详细]
程序员
在Windows 10主页中设置Composer路径变量

如何解决《在Windows10主页中设置Composer路径变量》经验，为你挑选了1个好方法。 ... [详细]
程序员
使用FileReader（Web API）在浏览器中读取大文件

如何解决《使用FileReader（WebAPI）在浏览器中读取大文件》经验，为你挑选了1个好方法。 ... [详细]
程序员
Bash参数扩展 - 获取文件的直接父目录

如何解决《Bash参数扩展-获取文件的直接父目录》经验，为你挑选了1个好方法。 ... [详细]

手机用户2402851155

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章