我正在尝试在ggplot2中的两个单独的堆叠条形图(相同的图形)之间绘制线条,以显示第二个条形图的两个部分是第一个条形图的子集.
我曾经尝试都geom_line
和geom_segment
.但是,我遇到了同样的问题geom
,即在同一个图中为每个(需要两行)指定一个单独的开始和停止,而不是有五行的数据帧.
没有线的图的示例代码:
library(data.table) Example <- data.table(X_Axis = c('Count', 'Count', 'Dollars', 'Dollars', 'Dollars'), Stack_Group = c('Purely A', 'A & B', 'Purely A Dollars', 'B Mixed Dollars', 'A Mixed dollars'), Value = c(10,3, 120000, 100000, 50000)) Example[, Percent := Value/sum(Value), by = X_Axis] ggplot(Example, aes(x = X_Axis, y = Percent, fill = factor(Stack_Group))) + geom_bar(stat = 'identity', width = 0.5) + scale_y_continuous(labels = scales::percent)
最终情节的目标:
您可以从绘图对象中获取此数据,而不是对段的开始和结束位置进行硬编码.这里有一个替代方案,您可以在其中提供x类别和条形元素的名称,在这些元素之间应绘制线条.
将绘图分配给变量:
p <- ggplot() + geom_bar(data = Example, aes(x = X_Axis, y = Percent, fill = Stack_Group), stat = 'identity', width = 0.5)
从绘图对象(ggplot_build
)中获取数据.转换为data.table
(setDT
):
d <- ggplot_build(p)$data[[1]] setDT(d)
在绘图对象的数据中,'x'和'group'变量不是由它们的名称明确给出的,而是作为数字给出的.因为分类变量是按字典顺序排列的ggplot
,所以我们可以rank
在每个'x'内匹配数字和它们的名称:
d[ , r := rank(group), by = x] Example[ , x := .GRP, by = X_Axis] Example[ , r := rank(Stack_Group), by = x]
加入以从原始数据添加'X_Axis'和'Stack_Group'的名称到绘图数据:
d <- d[Example[ , .(X_Axis, Stack_Group, x, r)], on = .(x, r)]
设置应在其中绘制线条的x类别和条形元素的名称:
x_start_nm <- "Count" x_end_nm <- "Dollars" e_start <- "A & B" e_upper <- "A Mixed dollars" e_lower <- "B Mixed Dollars"
选择绘图对象的相关部分以创建线的开始/结束数据:
d2 <- data.table(x_start = rep(d[X_Axis == x_start_nm & Stack_Group == e_start, xmax], 2), y_start = d[X_Axis == x_start_nm & Stack_Group == e_start, c(ymax, ymin)], x_end = rep(d[X_Axis == x_end_nm & Stack_Group == e_upper, xmin], 2), y_end = c(d[X_Axis == x_end_nm & Stack_Group == e_upper, ymax], d[X_Axis == x_end_nm & Stack_Group == e_lower, ymin]))
将线段添加到原始图:
p + geom_segment(data = d2, aes(x = x_start, xend = x_end, y = y_start, yend = y_end))
Here is another flexible and straightforward approach which is somewhat similar to @Henrik's answer but is working solely with user data. There is no need to extract data from a ggplot_build()
object.
Code:
library(data.table) library(forcats) Example <- data.table( X_Axis = fct_inorder(c("Count", "Count", "Dollars", "Dollars", "Dollars")), Stack_Group = fct_rev(fct_inorder(c("Purely A", "A & B", "Purely A Dollars", "B Mixed Dollars", "A Mixed dollars"))), Value = c(10, 3, 120000, 100000, 50000), Grp2 = fct_inorder(c("Purely", "Mixed", "Purely", "Mixed", "Mixed")) ) Example[, Percent := Value/sum(Value), by = X_Axis] Example[order(Grp2, -Stack_Group), Cumulated := cumsum(Percent), by = X_Axis]
Prepared data:
Example # X_Axis Stack_Group Value Grp2 Percent Cumulated #1: Count Purely A 10 Purely 0.7692308 0.7692308 #2: Count A & B 3 Mixed 0.2307692 1.0000000 #3: Dollars Purely A Dollars 120000 Purely 0.4444444 0.4444444 #4: Dollars B Mixed Dollars 100000 Mixed 0.3703704 0.8148148 #5: Dollars A Mixed dollars 50000 Mixed 0.1851852 1.0000000
Code:
library(ggplot2) w = 0.4 # width of bars ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) + geom_col(width = w) + geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2), data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)], inherit.aes = FALSE) + scale_y_continuous(labels = scales::percent)
Chart:
ggplot
implicitely coerces character
variables to factor
which controls the order in which items are plotted. By default, the order of levels in a factor is alphabetically. But here we do need to control the plot order explicitely. Therefore, we create factors with a specified order of levels with help of Hadley's handy forcats
package.
The order of levels in Stack_Group
is reversed to be in line with the order ggplot2
(version 2.2.0+) is stacking values (see ?position_stack
).
The data include two types of groups:
One is along the X_Axis
distinguishing between "Count"
and "Dollars"
.
The other one is hidden in Stack_Group
, the names of data items, and the way the OP wants to have the line segments drawn. Here, we explicitely define a new variable Grp2
which distinguishes between "Purely"
at the bottom of each bar and "Mixed"
at the top of each bar. This avoids to hard-code the start and end points of the line segments making this solution more flexible.
The cumulative percentages are computed for each bar. These are needed later for drawing the line segments.
The width of the bar is defined in variable w
and passed to the width
parameter of geom_col()
.
Introduced with version 2.2.0 of ggplot2
, geom_col()
is a shortcut for geom_bar(stat = "identity")
.
As there are only two bars, geom_lines()
is used to draw the line segments between them.
On the x-axis, the line segments range from x = 1 + w / 2 to x = 2 - w / 2. Here, we use the fact that ggplot
is using the integer numbers of the factor levels for plotting. So, "Count"
is plotted on x = 1 and "Dollar"
on x = 2. (This is why the factor levels had been defined explicitely.)
The y values for each bar are taken from the maximum values Top
of the cumulated percentages in each Grp2
which are computed by Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)]
. This allows for modifying names and order of data items within each Grp2
.
The parameter inherit.aes = FALSE
is required to prevent ggplot
from expecting a value for the fill
aesthetic.
If required, Grp2
could be visualised easily using a different line type:
w = 0.2 # width of bars ggplot(Example, aes(x = X_Axis, y = Percent, fill = Stack_Group)) + geom_col(width = w) + geom_line(aes(x = (1 - w) * as.numeric(X_Axis) + 1.5 * w, y = Top, group = Grp2, linetype = fct_rev(Grp2)), data = Example[, .(Top = max(Cumulated)), by = .(X_Axis, Grp2)], inherit.aes = FALSE) + scale_y_continuous(labels = scales::percent) + labs(linetype = "Purely vs Mixed")
Now, the factors of Grp 2
are displayed in the legend. The title in the legend has been renamed conveniently using labs()
. The order of factors in Grp2
has been reversed to have the solid line at 100% and to show the factors in the legend as they are stacked in the chart ("Purely"
at the bottom, "Mixed"
above).
Note that also the width parameter w
was changed for demonstration purposes.