Type: Package
Title: Setwise Hierarchical Rate of Erroneous Discovery
Version: 1.0.0
Maintainer: Toby Kenney <tkenney@mathstat.dal.ca>
Description: Setwise Hierarchical Rate of Erroneous Discovery (SHRED) methods for setwise variable selection with false discovery rate (FDR) control. Setwise variable selection means that sets of variables may be selected when the true variable cannot be identified. This allows us to maintain FDR control but increase power. Details of the SHRED methods are in Organ, Kenney & Gu (2026) <doi:10.48550/arXiv.2603.02160>.
License: GPL-3
Encoding: UTF-8
Imports: graphics, stats, ClustOfVar
NeedsCompilation: no
Packaged: 2026-03-07 12:42:44 UTC; tkenney
Author: Sarah Organ [aut], Toby Kenney [cre], Hong Gu [aut]
Repository: CRAN
Date/Publication: 2026-03-11 20:00:03 UTC

Cumulative sum-of-minimal-weights sizing function

Description

Calculates the sum-of-minimal-weights sizing function for the initial elements of a hierarchical tree.

Usage

CumMinWeights(weights,parents)

Arguments

weights

A vector of weights.

parents

A vector giving the index of the parent node for each node in the hierarchical tree or forest (NA for root nodes).

Details

For a subset of the hierarchical tree, minimal elements are elements that have no proper descendents in the subset. The sum-of-minimal-weights sizing function takes a subset A of the tree, and assigns the sum of weights corresponding to all minimal elements of A. Given a vector of weights and the hierarchical tree structure, this function calculates the sum-of-minimal-weights for every initial subset 1-to-k from this vector.

Value

A numerical vector of the sizing function.

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

Examples

set.seed(1)
pv<-rbeta(31,1,5)
parents<-c(NA,rep(seq_len(15),each=2)) # perfect binary tree
weights<-2^-c(5,rep(4,2),rep(3,4),rep(2,8),rep(1,16)) #weighted by
                                                      #no. of leaves

permutation<-order(runif(31)) # random permutation
permutation.inv<-rep(0,31)
permutation.inv[permutation]<-seq_len(31) #inverse permutation

### change weights and parents to the new permutation
weights.ordered<-weights[permutation]
parents.ordered<-permutation.inv[parents[permutation]]

### Compute sum minimal weights
CumMinWeights(weights.ordered,parents.ordered)

Hierarchical Generalised Linear Step-Up Procedure

Description

Performs the Generalised Linear Step-up Procedure (GLSUP) on p-values arranged in a hierarchical tree.

Usage

HGLSUP(pvals,weights,parents,threshold)

Arguments

pvals

A vector of p-values.

weights

A vector of weights for each hypothesis.

parents

A vector giving the index of the parent hypothesis for each hypothesis in the hierarchical tree or forest (NA for root hypotheses).

threshold

The cut-off slope.

Details

The GLSUP with sizing function s(A) on subsets of the sets of hypotheses, and cut-off slope a, rejects all hypotheses with p-values less than a cut-off c, where c is the largest cut-off such that s({i | p_i<c })>=ac. This function performs the GLSUP for a set of hypotheses arranged in a hierarchical tree or forest, with sizing function s(A) given by the sum of weights of minimal elements of the set A of hypotheses.

Value

A list containing the following components: "pv" The p-values in increasing order "ord" The order from the original vector of p-values "parent" The parent node of each p-value in the sorted list "weight" The weight assigned to each p-value in the sorted list "cum.weight" The cumulative sum of weights of minimal elements "selected" A vector of the indices of rejected hypotheses in the original order "pv.cut.off" The p-value cut-off below which hypotheses are rejected

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

References

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

http://arxiv.org/abs/2603.02160

Examples

set.seed(1)
pv<-rbeta(31,1,5)
parents<-c(NA,rep(seq_len(15),each=2)) # perfect binary tree
weights<-2^-c(5,rep(4,2),rep(3,4),rep(2,8),rep(1,16))
ans<-HGLSUP(pv,weights,parents,sum(weights)*20) # threshold under PRDS 
ans$selected

SHRED setwise variable selection with FDR control

Description

Performs variable selection, allowing the selection of sets of surrogates using the SHRED method for FDR control. This allows selection of sets of variables from a hierarchical clustering of the predictors.

Usage

SHRED(x,y,test,method,level,weights=NULL)

## S3 method for class 'SHRED'
print(x,...)
## S3 method for class 'SHRED'
plot(x,...)

Arguments

x

A matrix of predictor variables

y

The response variable

test

Either one of the following character strings: "gaussian", "binomial" or "poisson", or else a list with two named components - the first component "model" is a function for modelling the data which takes as input a formula. For Gaussian regression, the "lm" function can be used for this. For GLM fitting, you will need to make a function that sets the parameters. The second component "test" is a function taking as input two nested models fitted using the "model" function specified in the first component, and producing a corresponding p.value for the null hypothesis that the sub-model fits the data as well as the complete model.

The "make.test" function creates these named lists for the standard strings, and can be used as a template for creating more general lists.

method

The method used to calculate the cut-off in the SHRED method. Choices are: "PRDS", which uses a cut-off that is guaranteed to control FDR under the PRDS assumption for the p-values of the tests; "Arbitrary", which uses a stricter cut-off that is guaranteed to control FDR under arbitrary dependence between p-values; "Heuristic", which uses the cut-off for weighted BH, which is not guaranteed to control FDR under hierarchically clustered hypotheses, but usually performs well in practice.

Alternatively, method can be a fixed value, which is used as the cut-off value.

level

The desired level of FDR control. If "method" is numeric, this is ignored, and the desired level of control should be incorporated into the value provided. If "method" is one of the options, then this the level at which FDR control is desired. For method="Arbitrary" or method="PRDS", the true FDR will usually be lower than this; for method="Heuristic", the true FDR could be higher than this, but in practice, will often be close to this value.

weights

If weights=NULL (the default), the weight of each set is the inverse of the number of elements in the set, as suggested by Organ et al.. Otherwise, weights should be a function that takes as input a vector of sizes of sets, and returns the corresponding weights.

...

Additional graphics or printing parameters. The graphics parameters are passed to other functions. For print.SHRED, any additional parameters are ignored.

Details

The SHRED method hierarchically clusters the predictors, then tests all clades of variables in the hierarchical clustering for significance. It then uses a BH or BY style method to control weighted FDR in the set of selected sets of variables, where the weight for each set is the inverse of the number of variables contained in it. For each set, the conclusion of a rejected hypothesis is that at least one of the variables in the set is a true variable. Thus, a selected set is a false positive if none of the variables contained in it is a true variable, and a true positive if any variable is a true variable. Because of the hierarchical structure of SHRED, the sets selected are always disjoint.

Value

An object of class "SHRED" which contains the following components "X" The matrix of predictor variables "Y" The vector of response variables "method" The method used to calculate the cut-off slope "test" The test used to obtain p-values "level" The desired FDR control level "cut.off.slope" The slope of the threshold "threshold" The threshold for final selection (NA if no sets selected) "cluster" The results of hierarchical clustering "cluster.matrix" A matrix giving the selected clades as indicator vectors "pv" The p-values for each test "ord" The order of the p-values "weight" The weights for each hypothesis "cum.weight" The sum of the weights for each selected number of hypotheses "selected" Logical vector indicating which sets were selected "selected.sets" Logical Matrix, whose rows correspond to selected sets of variables "selected.weight" The total weight of all rejected hypotheses

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

References

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

http://arxiv.org/abs/2603.02160

Examples

set.seed(1)
X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8))))
Y<-rnorm(20)+X%*%c(0,3,0,3,3,0,0,0,0,0)

selection<-SHRED(X,Y,"gaussian","PRDS",0.05)
### This fits a linear model of Y on subsets of X and uses a cut-off
### that controls FDR at the 0.05 level under the PRDS assumption for
### the p-values of all null hypotheses.

print(selection)
plot(selection)

SHREDDER

Description

Perform variable selection with the SHREDDER method for FDR control

Usage

SHREDDER(x,y,test,level,weights)

Arguments

x

A matrix of predictor variables

y

The response variable

test

Either one of the following character strings: "gaussian", "binomial" or "poisson", or else a list with two named components - the first component "model" is a function for modelling the data which takes as input a formula. For Gaussian regression, the "lm" function can be used for this. For GLM fitting, you will need to make a function that sets the parameters. The second component "test" is a function taking as input two nested models fitted using the "model" function specified in the first component, and producing a corresponding p.value for the null hypothesis that the sub-model fits the data as well as the complete model.

The "make.test" function creates these named lists for the standard strings, and can be used as a template for creating more general lists.

level

The desired level of FDR control. If "method" is numeric, this is ignored, and the desired level of control should be incorporated into the value provided. If "method" is one of the options, then this the level at which FDR control is desired. For method="Arbitrary" or method="PRDS", the true FDR will usually be lower than this; for method="Heuristic", the true FDR could be higher than this, but in practice, will often be close to this value.

weights

If weights=NULL (the default), the weight of each set is the inverse of the number of elements in the set, as suggested by Organ et al.. Otherwise, weights should be a function that takes as input a vector of sizes of sets, and returns the corresponding weights.

Details

The SHREDDER method hierarchically clusters the predictors, then tests all clades of variables in the hierarchical clustering for significance. It then uses a BH or BY style method to control weighted FDR in the set of selected sets of variables, where the weight for each set is the inverse of the number of variables contained in it. For each set, the conclusion of a rejected hypothesis is that at least one of the variables in the set is a true variable. Thus, a selected set is a false positive if none of the variables contained in it is a true variable, and a true positive if any variable is a true variable. SHREDDER only selects sets of variables when the p-values for all larger sets are below the cut-off. Because of the hierarchical structure of SHREDDER, the sets selected are always disjoint.

Value

An object of class "SHREDDER" which contains the following components "X" The matrix of predictor variables "Y" The vector of response variables "method" "SHREDDER" "test" The test used to obtain p-values "level" The desired FDR control level "cut.off.slope" The slope of the threshold "threshold" The threshold for final selection (NA if no sets selected) "cluster" The results of hierarchical clustering "cluster.matrix" A matrix giving the selected clades as indicator vectors "pv" The p-values for each test "ord" The order of the p-values "weight" The weights for each hypothesis "cum.weight" The sum of the weights for each selected number of hypotheses "selected" Logical vector indicating which sets were selected "selected.sets" Logical Matrix, whose rows correspond to selected sets of variables "selected.weight" The total weight of all rejected hypotheses

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

References

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

http://arxiv.org/abs/2603.02160

Examples

set.seed(1)
X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8))))
Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0)

selection<-SHREDDER(X,Y,"gaussian",0.05)
### This fits a linear model of Y on subsets of X and uses a cut-off
### that controls FDR at the 0.05 level under the PRDS assumption for
### the p-values of all null hypotheses.

selection
plot(selection)

Convert clustering to matrix form

Description

Converts clustering from ClustOfVar package to membership matrix format.

Usage

convert.to.matrix(clust)

Arguments

clust

A clustering from the ClustOfVar package.

Details

For a clustering of p variables, this converts the clustering into a list of parents and

Value

A list containing the following components: "parent" The parent node of each node in the clustering "matrix" a 2p-1 by p matrix of 0 and 1 whose [i,j] entry is 1 if node j is in cluster i.

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

Examples

set.seed(1)
X<-matrix(rnorm(200),20,10)
cl<-ClustOfVar::hclustvar(X)
convert.to.matrix(cl)

Compute cut-off slopes for SHRED

Description

Computes the cut-off slope that controls FDR at the specified level under the assumptions

Usage

PRDS.cutoff(weights,parents,level)
SHRED.cutoff(weights,parents,level)
SHREDDER.cutoff(weights,parents,level)

Arguments

weights

A vector of weights for each hypothesis.

parents

A vector giving the index of the parent hypothesis for each hypothesis in the hierarchical tree or forest (NA for root hypotheses).

level

The desired FDR control level.

Details

The GLSUP with sum-of-minimal-weights sizing function is proven to control gFDR for certain choices of cut-off slope, under various assumptions. These functions calculate the appropriate slope to control gFDR under the corresponding assumptions - PRDS.cutoff computes the cut-off slope that controls gFDR under the PRDS assumption. SHRED.cutoff computes the cut-off slope that guarantees gFDR control, regardless of dependency between p-values. SHREDDER.cutoff computes the cutoff that controls gFDR for the SHREDDER method under the PPRDS assumption. This also often works in practice for the SHRED method.

Value

The cut-off slope to be used for the HGLSUP function.

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

References

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

http://arxiv.org/abs/2603.02160

Examples

parents<-c(NA,rep(seq_len(15),each=2)) # perfect binary tree
weights<-2^-c(5,rep(4,2),rep(3,4),rep(2,8),rep(1,16))
PRDS.cutoff(weights,parents,0.05)
SHRED.cutoff(weights,parents,0.05)
SHREDDER.cutoff(weights,parents,0.05)

Calculate p-values for set-wise variable selection

Description

Performs hypothesis tests for the null hypothesis that a set of predictors contains no true predictor.

Usage

get.p.vals(x,y,clust,test)

Arguments

x

The matrix of predictors.

y

A vector containing the response variable.

clust

A matrix whose rows are indicator vectors for subsets of predictors of X.

test

either one of the character strings "gaussian", "binomial" or "poisson", or else a list with two components: model — a function for fitting the model based on a formula typically "lm" for gaussian regression, something based on "glm" for other glm models. p.val — a test function based on comparison of the two models typically based on the anova function

Details

The rows of the matrix clust define a collection of subsets of the predictor variables. This function, for each of these sets, computes the p-values for the hypotheses that the set contains no true predictors.

Value

A vector of p-values.

Author(s)

Sarah Organ, Toby Kenney, Hong Gu

Examples

set.seed(1)
X<-matrix(rnorm(200),20,10)%*%(diag(rep(1,10))-c(0.4,0.4,rep(0,8))%*%t(c(0.4,0.4,rep(0,8))))
Y<-rnorm(20)+X%*%c(0,2,0,2,2,0,0,0,0,0)
clusters<-ClustOfVar::hclustvar(X)
clust<-convert.to.matrix(clusters)
get.p.vals(X,Y,clust$matrix,"gaussian")