Introduction to sparsebn

Bryon Aragam

2016-08-19

sparsebn is an R package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed to handle:

The workhorse behind sparsebn is the sparsebnUtils package, which provides various S3 classes and methods for representing and manipulating graphs. The basic algorithms are implemented in ccdrAlgorithm and discretecdAlgorithm.

Overview

The main methods for learning graphical models are:

Currently, estimation of precision and covariances matrices is limited to Gaussian data.

Example: Learning a Markov Chain

Suppose that the data is generated by a simple Markov chain:

\[X_1\to X_2\to X_3.\]

Assume unit influences between variables, i.e. \(X_j\sim\mathcal{N}(0,1)\) and \(X_{j} = X_{j-1} + \varepsilon_j\) with \(\varepsilon_j\sim\mathcal{N}(0,1)\) for \(j>1\). If \(X=(X_1,X_2,X_3)\) then \(X=B^TX+\varepsilon\sim\mathcal{N}(0,\Sigma)\), where we use the following parameters:

\[ B = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}, \quad \Omega = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} 3 & 2 & 1 \\ 2 & 2 & 1 \\ 1 & 1 & 1 \end{pmatrix}. \]

To generate data from this model, first define the covariance matrix:

mean.vector <- rep(0, 3)
covariance.matrix <- rbind(c(3,2,1),
                           c(2,2,1),
                           c(1,1,1))

Then we can generate some data using mvtnorm::rmvnorm:

gaussian.data <- mvtnorm::rmvnorm(n = 100, mean = mean.vector, sigma = covariance.matrix)
colnames(gaussian.data) <- c("X1", "X2", "X3")

In order to use the methods in the sparsebn package, we need to indicate what kind of data we are working with by wrapping the data into a sparsebnData object:

library("sparsebn")
dat <- sparsebnData(gaussian.data, type = "continuous")

Now we can use this data to estimate \(B\):

dags.out <- estimate.dag(data = dat, 
                         lambdas.length = 20, 
                         edge.threshold = 10, 
                         verbose = FALSE)
dags.out
## sparsebn Solution Path
##  20 estimates for lambda in [0.1,10]
##  Number of edges per solution: 0-1-2-3-2-2-2-2-2-3-3-3-3-3-3-3-3-3-3-3
##  3 nodes
##  100 observations

Note that the output is a solution path (stored internally as a sparsebnPath object), instead of a single estimate. In order to select a particular DAG, we need to do model selection (not implemented yet).

As expected, the third estimate in our solution path gives the correct estimate:

dags.out[[3]]
## CCDr estimate
## 100 observations
## lambda = 6.158482
## 
## DAG: 
## [X1]   
## [X2]  X1   
## [X3]  X2
get.adjacency.matrix(dags.out[[3]])
## 3 x 3 sparse Matrix of class "dgCMatrix"
##    X1 X2 X3
## X1  .  1  .
## X2  .  .  1
## X3  .  .  .

We can also use this data to directly estimate the covariance matrix \(\Sigma\):

cov.out <- estimate.covariance(data = dat)

Compared with the output of estimate.dag, which is a more complicated sparsebnPath object, the output of estimate.covariance (and also estimate.precision) is simply a list of matrices:

class(cov.out)
## [1] "list"

Let’s take a look at the third estimate in the solution path (corresponding to the correct estimate of \(B\) from before):

cov.out[[3]]
## 3 x 3 sparse Matrix of class "dgCMatrix"
##                                   
## [1,] 2.8119991 1.9081392 0.9073762
## [2,] 1.9081392 1.9621736 0.9330711
## [3,] 0.9073762 0.9330711 0.9271749

If we increase our sample size to \(n=1000\), the estimate gets closer to the truth:

gaussian.data <- mvtnorm::rmvnorm(n = 1000, mean = mean.vector, sigma = covariance.matrix)
dat <- sparsebnData(gaussian.data, type = "continuous")
cov.out <- estimate.covariance(data = dat)
cov.out[[3]]
## 3 x 3 sparse Matrix of class "dgCMatrix"
##                                   
## [1,] 3.0240941 2.0244527 0.9877571
## [2,] 2.0244527 2.0272848 0.9891389
## [3,] 0.9877571 0.9891389 0.9552805

Example: Bayesian network repository

For a less trivial example, we will try to reconstruct the pathfinder network from the Bayesian network repository. The pathfinder network has 109 nodes and 195 edges.

First, load the data:

data(pathfinder)
dat <- sparsebnData(pathfinder$data, type = "c", ivn = NULL)
## A list of interventions was not specified: Assuming data is purely observational.

Note that we create a sparsebnData object with no interventions by setting ivn = NULL. This time, instead of automatically generating a grid of regularization parameters, we generate one manually (see ?generate.lambdas). Then we run the algorithm:

nn <- num.samples(dat) # number of samples in the dataset / equivalent to nrow(dat$data)
lambdas <- generate.lambdas(sqrt(nn), 0.05, lambdas.length = 50, scale = "linear")
dags.out <- estimate.dag(data = dat, 
                         lambdas = lambdas,
                         edge.threshold = 500, 
                         verbose = FALSE)
dags.out
## sparsebn Solution Path
##  50 estimates for lambda in [1.581139,31.62278]
##  Number of edges per solution: 0-16-22-31-34-36-40-59-58-61-62-63-63-65-69-93-106-107-108-108-108-108-108-108-108-108-108-108-108-108-109-113-117-119-119-120-123-128-133-134-138-143-156-172-179-195-210-243-268-395
##  109 nodes
##  1000 observations

Let’s visualize the solution with 195 edges. For this, we use the igraph package:

solution <- get.solution(dags.out, edges = 195)
plot(solution,
     layout = igraph::layout.circle(to_igraph(solution$edges)),
     vertex.label = NA,
     vertex.size = 5,
     vertex.color = gray(0.75),
     edge.color = gray(0),
     edge.width = 1,
     edge.arrow.size = .25)

For comparison, let’s plot the original pathfinder graph.

plot(pathfinder$dag,
     layout = igraph::layout.circle(to_igraph(pathfinder$dag)),
     vertex.label = NA,
     vertex.size = 5,
     vertex.color = gray(0.75),
     edge.color = gray(0),
     edge.width = 1,
     edge.arrow.size = .25)

We can also use the parameter selection technique described in \[[4](#references)\] to automatically select a good index (see ?select.parameter for more details):

select.idx <- select.parameter(dags.out, dat)
solution <- get.solution(dags.out, index = select.idx)
plot(solution,
     layout = igraph::layout.circle(to_igraph(solution$edges)),
     vertex.label = NA,
     vertex.size = 5,
     vertex.color = gray(0.75),
     edge.color = gray(0),
     edge.width = 1,
     edge.arrow.size = .25)

Note that the output of estimate.dag is a list of graphs, i.e. without weights. In order to do inference and estimate the weights in these graphs, use estimate.parameters:

out.fit <- estimate.parameters(dags.out, data = dat)

The output is a list of weights, one for each value of \(\lambda\). The weights are given in terms of \((B,\Omega)\), corresponding to the list list(coefs, vars). For example, we can see how the weight of the first node on the second changes as we decrease \(\lambda\):

unlist(lapply(out.fit, function(x) x$coefs[1,2]))
##  [1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
##  [8] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## [15] 0.000000 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [22] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [29] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [36] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [43] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [50] 1.142223

Appendix: Simulating data from a linear Gaussian SEM

The pathfinder dataset has been simulated from a linear Gaussian SEM. In this section, we illustrate how to do this from scratch.

We first need a DAG; for this example we will use the pathfinder network from the previous section. First, load this DAG:

data(pathfinder)
B <- as.matrix(get.adjacency.matrix(pathfinder$dag)) # pathfinder network as an adjacency matrix

If \(X=B^TX+\varepsilon\sim\mathcal{N}(0,\Sigma)\) and \(\varepsilon\sim\mathcal{N}(0,\Omega)\), then one can show that: \[ \Sigma = (I-B)^{-T}\Omega(I-B)^{-1}. \]

Assuming unit influences (i.e. \(\beta_{ij}=1\) if \(\beta_{ij}\ne 0\)) and unit variances (i.e. \(\omega_j^2=1\) for all \(j\)), we can then compute \(\Sigma\) directly:

id <- diag(rep(1, num.nodes(pathfinder$dag)))   # 109x109 identity matrix
Omega <- id                                     # conditional variances
Sigma <- solve(t(id - B)) %*% Omega %*% solve(id - B)

Finally, we can use the mvtnorm package to generate random multivariate Gaussian data:

set.seed(123)
nn <- 1000
gaussian.data <- mvtnorm::rmvnorm(nn, sigma = Sigma)

Instead of setting \(\beta_{ij}=1\), we can also use random edge weights:

B[B!=0] <- runif(n = num.edges(pathfinder$dag), min = 0.5, max = 2)
Sigma <- solve(t(id - B)) %*% Omega %*% solve(id - B)
gaussian.data <- mvtnorm::rmvnorm(nn, sigma = Sigma)

References

\[1\] Aragam, B. and Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research. 16(Nov):2273−2328.

\[2\] Fu, F., Gu, J., and Zhou, Q. (2014). Adaptive penalized estimation of directed acyclic graphs from categorical data. arXiv: 1403.2310.

\[3\] Aragam, B., Amini, A. A., and Zhou, Q. (2015). Learning directed acyclic graphs with penalized neighbourhood regression. arXiv: 1511.08963.

\[4\] Fu, F. and Zhou, Q. (2013). Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. Journal of the American Statistical Association, 108: 288-300.