sparsebn
is an R
package for learning sparse Bayesian networks and other graphical models from high-dimensional data via sparse regularization. Designed to handle:
The workhorse behind sparsebn
is the sparsebnUtils
package, which provides various S3 classes and methods for representing and manipulating graphs. The basic algorithms are implemented in ccdrAlgorithm
and discretecdAlgorithm
.
The main methods for learning graphical models are:
estimate.dag
for directed acyclic graphs (Bayesian networks).estimate.precision
for undirected graphs (Markov random fields).estimate.covariance
for covariance matrices.Currently, estimation of precision and covariances matrices is limited to Gaussian data.
Suppose that the data is generated by a simple Markov chain:
\[X_1\to X_2\to X_3.\]
Assume unit influences between variables, i.e. \(X_j\sim\mathcal{N}(0,1)\) and \(X_{j} = X_{j-1} + \varepsilon_j\) with \(\varepsilon_j\sim\mathcal{N}(0,1)\) for \(j>1\). If \(X=(X_1,X_2,X_3)\) then \(X=B^TX+\varepsilon\sim\mathcal{N}(0,\Sigma)\), where we use the following parameters:
\[ B = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}, \quad \Omega = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} 3 & 2 & 1 \\ 2 & 2 & 1 \\ 1 & 1 & 1 \end{pmatrix}. \]
To generate data from this model, first define the covariance matrix:
mean.vector <- rep(0, 3)
covariance.matrix <- rbind(c(3,2,1),
c(2,2,1),
c(1,1,1))
Then we can generate some data using mvtnorm::rmvnorm
:
gaussian.data <- mvtnorm::rmvnorm(n = 100, mean = mean.vector, sigma = covariance.matrix)
colnames(gaussian.data) <- c("X1", "X2", "X3")
In order to use the methods in the sparsebn package, we need to indicate what kind of data we are working with by wrapping the data into a sparsebnData
object:
library("sparsebn")
dat <- sparsebnData(gaussian.data, type = "continuous")
Now we can use this data to estimate \(B\):
dags.out <- estimate.dag(data = dat,
lambdas.length = 20,
edge.threshold = 10,
verbose = FALSE)
dags.out
## sparsebn Solution Path
## 20 estimates for lambda in [0.1,10]
## Number of edges per solution: 0-1-2-3-2-2-2-2-2-3-3-3-3-3-3-3-3-3-3-3
## 3 nodes
## 100 observations
Note that the output is a solution path (stored internally as a sparsebnPath
object), instead of a single estimate. In order to select a particular DAG, we need to do model selection (not implemented yet).
As expected, the third estimate in our solution path gives the correct estimate:
dags.out[[3]]
## CCDr estimate
## 100 observations
## lambda = 6.158482
##
## DAG:
## [X1]
## [X2] X1
## [X3] X2
get.adjacency.matrix(dags.out[[3]])
## 3 x 3 sparse Matrix of class "dgCMatrix"
## X1 X2 X3
## X1 . 1 .
## X2 . . 1
## X3 . . .
We can also use this data to directly estimate the covariance matrix \(\Sigma\):
cov.out <- estimate.covariance(data = dat)
Compared with the output of estimate.dag
, which is a more complicated sparsebnPath
object, the output of estimate.covariance
(and also estimate.precision
) is simply a list of matrices:
class(cov.out)
## [1] "list"
Let’s take a look at the third estimate in the solution path (corresponding to the correct estimate of \(B\) from before):
cov.out[[3]]
## 3 x 3 sparse Matrix of class "dgCMatrix"
##
## [1,] 2.8119991 1.9081392 0.9073762
## [2,] 1.9081392 1.9621736 0.9330711
## [3,] 0.9073762 0.9330711 0.9271749
If we increase our sample size to \(n=1000\), the estimate gets closer to the truth:
gaussian.data <- mvtnorm::rmvnorm(n = 1000, mean = mean.vector, sigma = covariance.matrix)
dat <- sparsebnData(gaussian.data, type = "continuous")
cov.out <- estimate.covariance(data = dat)
cov.out[[3]]
## 3 x 3 sparse Matrix of class "dgCMatrix"
##
## [1,] 3.0240941 2.0244527 0.9877571
## [2,] 2.0244527 2.0272848 0.9891389
## [3,] 0.9877571 0.9891389 0.9552805
For a less trivial example, we will try to reconstruct the pathfinder network from the Bayesian network repository. The pathfinder network has 109 nodes and 195 edges.
First, load the data:
data(pathfinder)
dat <- sparsebnData(pathfinder$data, type = "c", ivn = NULL)
## A list of interventions was not specified: Assuming data is purely observational.
Note that we create a sparsebnData
object with no interventions by setting ivn = NULL
. This time, instead of automatically generating a grid of regularization parameters, we generate one manually (see ?generate.lambdas
). Then we run the algorithm:
nn <- num.samples(dat) # number of samples in the dataset / equivalent to nrow(dat$data)
lambdas <- generate.lambdas(sqrt(nn), 0.05, lambdas.length = 50, scale = "linear")
dags.out <- estimate.dag(data = dat,
lambdas = lambdas,
edge.threshold = 500,
verbose = FALSE)
dags.out
## sparsebn Solution Path
## 50 estimates for lambda in [1.581139,31.62278]
## Number of edges per solution: 0-16-22-31-34-36-40-59-58-61-62-63-63-65-69-93-106-107-108-108-108-108-108-108-108-108-108-108-108-108-109-113-117-119-119-120-123-128-133-134-138-143-156-172-179-195-210-243-268-395
## 109 nodes
## 1000 observations
Let’s visualize the solution with 195 edges. For this, we use the igraph
package:
solution <- get.solution(dags.out, edges = 195)
plot(solution,
layout = igraph::layout.circle(to_igraph(solution$edges)),
vertex.label = NA,
vertex.size = 5,
vertex.color = gray(0.75),
edge.color = gray(0),
edge.width = 1,
edge.arrow.size = .25)
For comparison, let’s plot the original pathfinder graph.
plot(pathfinder$dag,
layout = igraph::layout.circle(to_igraph(pathfinder$dag)),
vertex.label = NA,
vertex.size = 5,
vertex.color = gray(0.75),
edge.color = gray(0),
edge.width = 1,
edge.arrow.size = .25)
We can also use the parameter selection technique described in \[[4](#references)\] to automatically select a good index (see ?select.parameter
for more details):
select.idx <- select.parameter(dags.out, dat)
solution <- get.solution(dags.out, index = select.idx)
plot(solution,
layout = igraph::layout.circle(to_igraph(solution$edges)),
vertex.label = NA,
vertex.size = 5,
vertex.color = gray(0.75),
edge.color = gray(0),
edge.width = 1,
edge.arrow.size = .25)
Note that the output of estimate.dag
is a list of graphs, i.e. without weights. In order to do inference and estimate the weights in these graphs, use estimate.parameters
:
out.fit <- estimate.parameters(dags.out, data = dat)
The output is a list of weights, one for each value of \(\lambda\). The weights are given in terms of \((B,\Omega)\), corresponding to the list list(coefs, vars)
. For example, we can see how the weight of the first node on the second changes as we decrease \(\lambda\):
unlist(lapply(out.fit, function(x) x$coefs[1,2]))
## [1] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## [8] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## [15] 0.000000 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [22] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [29] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [36] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [43] 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177 1.058177
## [50] 1.142223
The pathfinder
dataset has been simulated from a linear Gaussian SEM. In this section, we illustrate how to do this from scratch.
We first need a DAG; for this example we will use the pathfinder network from the previous section. First, load this DAG:
data(pathfinder)
B <- as.matrix(get.adjacency.matrix(pathfinder$dag)) # pathfinder network as an adjacency matrix
If \(X=B^TX+\varepsilon\sim\mathcal{N}(0,\Sigma)\) and \(\varepsilon\sim\mathcal{N}(0,\Omega)\), then one can show that: \[ \Sigma = (I-B)^{-T}\Omega(I-B)^{-1}. \]
Assuming unit influences (i.e. \(\beta_{ij}=1\) if \(\beta_{ij}\ne 0\)) and unit variances (i.e. \(\omega_j^2=1\) for all \(j\)), we can then compute \(\Sigma\) directly:
id <- diag(rep(1, num.nodes(pathfinder$dag))) # 109x109 identity matrix
Omega <- id # conditional variances
Sigma <- solve(t(id - B)) %*% Omega %*% solve(id - B)
Finally, we can use the mvtnorm
package to generate random multivariate Gaussian data:
set.seed(123)
nn <- 1000
gaussian.data <- mvtnorm::rmvnorm(nn, sigma = Sigma)
Instead of setting \(\beta_{ij}=1\), we can also use random edge weights:
B[B!=0] <- runif(n = num.edges(pathfinder$dag), min = 0.5, max = 2)
Sigma <- solve(t(id - B)) %*% Omega %*% solve(id - B)
gaussian.data <- mvtnorm::rmvnorm(nn, sigma = Sigma)
\[1\] Aragam, B. and Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research. 16(Nov):2273−2328.
\[2\] Fu, F., Gu, J., and Zhou, Q. (2014). Adaptive penalized estimation of directed acyclic graphs from categorical data. arXiv: 1403.2310.
\[3\] Aragam, B., Amini, A. A., and Zhou, Q. (2015). Learning directed acyclic graphs with penalized neighbourhood regression. arXiv: 1511.08963.
\[4\] Fu, F. and Zhou, Q. (2013). Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. Journal of the American Statistical Association, 108: 288-300.