我有一个简单的问题.我想总结两个非参数分布.
这是一个例子.有两个城市有10个房子.我们知道每个房子的能源消耗.(编辑)我想得到从每个城市中选择的随机房屋总和的概率分布.
A1 <- c(1,2,3,3,3,4,4,5,6,7) #10 houses' energy consumption for city A B1 <- c(11,13,15,17,17,18,18,19,20,22) #10 houses' energy consumption for city B
我有A1和B1的概率分布,我怎样才能得到A1 + B1的概率分布?如果我只是A1+B1
在R中使用,它会给出12 15 18 20 20 22 22 24 26 29
.但是,我不认为这是对的.因为房子里没有秩序.
当我改变房屋的顺序时,它会产生另一个结果.
# Original A1 <- c(1,2,3,3,3,4,4,5,6,7) B1 <- c(11,13,15,17,17,18,18,19,20,22) #change order 1 A2 <- c(7,6,5,4,4,3,3,3,2,1) B2 <- c(22,20,19,18,18,17,17,15,13,11) #change order 2 A3 <- c(3,3,3,4,4,5,6,7,1,2) B3 <- c(17,17,18,18,19,13,20,11,22,15) sum1 <- A1+B1; sum1 sum2 <- A1+B2; sum2 sum3 <- A3+B3; sum3
红线是sum1,sum2和sum3.我不知道如何分配两个发行版的总和.请给我任何想法.谢谢!
(如果这些分布是正态分布或均匀分布,我可以很容易地得到分布的总和,但这些不是正常的,没有顺序)
理论上,两个随机变量的和分布是它们的PDF卷积,细节如下:
PDF(Z)= PDF(Y)*PDF(X)
所以,我认为这种情况可以通过计算convolution
.
# your data A1 <- c(1,2,3,3,3,4,4,5,6,7) #10 houses' energy consumption for city A B1 <- c(11,13,15,17,17,18,18,19,20,22) #10 houses' energy consumption for city B # compute PDF/CDF PDF_A1 <- table(A1)/length(A1) CDF_A1 <- cumsum(PDF_A1) PDF_B1 <- table(B1)/length(B1) CDF_B1 <- cumsum(PDF_B1) # compute the sum distribution PDF_C1 <- convolve(PDF_B1, PDF_A1, type = "open") # plotting plot(PDF_C1, type="l", axe=F, main="PDF of A1+B1") box() axis(2) # FIXME: is my understand for X correct? axis(1, at=seq(1:14), labels=(c(names(PDF_A1)[-1],names(PDF_B1))))
注意:
CDF:累积分布函数
PDF:概率密度函数
## To make the x-values correspond to actually sums, consider ## compute PDF ## pad zeros in probability vectors to convolve r <- range(c(A1, B1)) pdfA <- pdfB <- vector('numeric', diff(r)+1L) PDF_A1 <- table(A1)/length(A1) # same as what you have done PDF_B1 <- table(B1)/length(B1) pdfA[as.numeric(names(PDF_A1))] <- as.vector(PDF_A1) # fill the values pdfB[as.numeric(names(PDF_B1))] <- as.vector(PDF_B1) ## compute the convolution and plot res <- convolve(pdfA, rev(pdfB), type = "open") plot(res, type="h", xlab='Sum', ylab='')
## In this simple case (with discrete distribution) you can compare ## to previous solution tst <- rowSums(expand.grid(A1, B1)) plot(table(tst) / sum(as.vector(table(tst))), type='h')