R package coga: Convolution of Gamma Distributions

Chaoran Hu

2017-07-27

Introduction

This R package coga can help you to calculate density and distribution function of convolution of gamma distributions. The convolution of gamma distributions is the sum of series of independent gamma distributions. The algorithm of this package comes from Moschopoulos Peter G. (1985). The R coda in this vignette also can be considered as useful examples.

Algorithm

Assume that we have several random variables, \(X_1, ..., X_n\), and all random variables follow gamma distribution independently with shape parameters \(\alpha_i\) and scale parameters \(\beta_i\), where \(i = 1, ..., n\). Then, the density of \(Y = X_1 + ... + X_n\) can be expressed as:

\[g(y) = C \sum_{k=0}^{\infty} \lambda_k y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})\]

And the distribution function \(G(w)=Pr(Y<w)\) is expressed as:

\[G(w) = C \sum_{k=0}^{\infty} \lambda_k \int_{0}^{w} (y^{\rho + k - 1} e^{-y/\beta_1} / (\Gamma(\rho + k) \beta_{1}^{\rho + k})) dy\]

The integrate in this formula is incomplete gamma function and can be calculated by distribution function of gamma distribution.

More details about this algorithm can be found in paper of Moschopoulos Peter G. (1985).

Correctness

Assume that we have two random variables, \(X_1\) and \(X_2\), where \(X_1\) is a gamma distribution with shape parameter \(3\), and rate parameter \(2\), and \(X_2\) is a gamma distribution with shape parameter \(4\), and rate parameter \(3\). The density and distribution funciton of \(Y = X_1 + X_2\) will be calculated.

Correctness check for density function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
pdf <- dcoga(grid, shape=c(3, 4), rate=c(2, 3))
 
plot(density(y), col="blue")
lines(grid, pdf, col="red")

Correctness check for distribution function:

y <- rcoga(1000000, c(3,4), c(2,3))
grid <- seq(0, 8, length.out=1000)
cdf <- pcoga(grid, shape=c(3, 4), rate=c(2, 3))

plot(ecdf(y), col="blue")
lines(grid, cdf, col="red")

Speed

The ‘dcoga’ and ‘pcoga’ functions in this package ‘coga’ is based on Cpp code. The following experiment shows the advantage of Cpp code, which runs on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz computer.

grid <- seq(0, 15, length.out=10)

microbenchmark::microbenchmark(
    dcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::dcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4)),
    pcoga(grid, shape=c(3,4,5), rate=c(2,3,4)),
    coga:::pcoga.R(grid, shape=c(3,4,5), rate=c(2,3,4))
)
## Unit: milliseconds
##                                                         expr       min
##           dcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  1.310279
##  coga:::dcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 30.424495
##           pcoga(grid, shape = c(3, 4, 5), rate = c(2, 3, 4))  4.553233
##  coga:::pcoga.R(grid, shape = c(3, 4, 5), rate = c(2, 3, 4)) 37.759052
##         lq      mean    median        uq        max neval
##   1.483728  1.750481  1.757259  1.889137   2.677676   100
##  32.744795 39.082002 35.437426 38.668515 102.428354   100
##   4.958614  9.088243  5.221953  5.796708  49.863904   100
##  40.370152 53.267497 43.868735 72.830472  95.992447   100

Note: In this example, ‘dcoga.R’, and ‘pcoga.R’ are the R version functions for density, and distribution functions of convolution of gamma distributions. We do not put these two R functions as export functions in package ‘coga’, but you can still use them by ‘coga:::dcoga’, and ‘coga:::pcoga’.

The convolution of two gamma distributions is a special situation of convolution of gamma distributions. The functions ‘dcoga2dim’ and ‘pcoga2dim’ can solve this problem with higher efficiency (they are much more faster than the general functions, ‘dcoga’ and ‘pcoga’.)

grid <- seq(0, 15, length.out=100)

microbenchmark::microbenchmark(
    dcoga(grid, shape=c(3,4), rate=c(2,3)),
    dcoga2dim(grid, 3, 4, 2, 3),
    pcoga(grid, shape=c(3,4), rate=c(2,3)),
    pcoga2dim(grid, 3, 4, 2, 3))
## Unit: microseconds
##                                          expr       min         lq
##  dcoga(grid, shape = c(3, 4), rate = c(2, 3)) 16481.804 18782.0325
##                   dcoga2dim(grid, 3, 4, 2, 3)    58.021    62.3715
##  pcoga(grid, shape = c(3, 4), rate = c(2, 3)) 37958.314 39996.2400
##                   pcoga2dim(grid, 3, 4, 2, 3)  3815.581  3830.6490
##       mean    median        uq        max neval
##  27693.791 21054.628 41875.764  54540.518   100
##     72.482    71.025    76.619    144.368   100
##  57253.241 61935.891 67808.327 131291.390   100
##   4029.803  3842.713  4074.259   5741.707   100

Parameters Recycling

Please take care of that R functions dcoga, pcoga, and rcoga in this package can handle different lengths of parameter shape and rate by recycling shorter parameter. That means that dcoga(3, c(2,3), c(3,4,5,3,4)) and dcoga(3, c(2,3,2,3,2), c(3,4,5,3,4)) will give the same result. If the length of the longer parameter is not a multiple of the length of shorter one, these three R functions will give a Warning message.

References

[1] Moschopoulos, Peter G. “The distribution of the sum of independent gamma random variables.” Annals of the Institute of Statistical Mathematics 37.1 (1985): 541-544.

[2] Mathai, A.M.: Storage capacity of a dam with gamma type inputs. Ann. Inst. Statist.Math. 34, 591-597 (1982).