computeMultivariateDigitization {divergence} | R Documentation |
Function for obtaining the digitized form, along with other relevant statistics and measures given a data matrix and a baseline matrix with multivariate features of interest
computeMultivariateDigitization(seMat, seMat.base, FeatureSets, computeQuantiles = TRUE, gamma = c(1:9/100, 1:9/10), beta = 0.95, alpha = 0.01, distance = "euclidean", verbose = TRUE, findGamma = TRUE, Groups = NULL, classes = NULL)
seMat |
SummarizedExperiment with assay to be digitized, in [0, 1], with each column corresponding to a sample and each row corresponding to a feature; usually in quantile form. |
seMat.base |
SummarizedExperiment with baseline assay in [0, 1], with each column corresponding to a sample and each row corresponding to a feature |
FeatureSets |
The multivariate features in list or matrix form. In list form, each list element should be a vector of individual features; in matrix form, it should be a binary matrix with rownames being individual features and column names being the names of the feature sets. |
computeQuantiles |
Apply quantile transformation to both data and baseline matrices (TRUE or FALSE; defaults to TRUE). |
gamma |
Range of gamma values to search through. By default gamma = 0.01, 0.02, ... 0.09, 0.1, 0.2, ..., 0.9. |
beta |
Parameter for eliminating outliers (0 < beta <= 1). By default beta=0.95. |
alpha |
Expected proportion of divergent features per sample to be estimated. The optimal gamma providing this level of divergence in the baseline data will be searched for. |
distance |
Type of distance to be calculated between points. Any type of distance that can be passed on to the dist function can be used (default 'euclidean'). |
verbose |
Logical indicating whether to print status related messages during computation (defaults to TRUE). |
findGamma |
Logical indicating whether to search for optimal gamma values through the given gamma values (defaults to TRUE). If FALSE, the first value given in gamma will be used. |
Groups |
Factor indicating class association of samples |
classes |
Vector of class labels |
A list with elements: Mat.div: divergence coding of data matrix in binary form, of same dimensions at seMat baseMat.div: divergence coding of base matrix in binary form, of same column names at seMat.base, rows being multivariate features. div: data frame with the number of divergent features in each sample features.div: data frame with the divergent probability of each feature; divergence probability for each phenotype in included as well if 'Groups' and 'classes' inputs were provided. Baseline: a list containing a "Ranges" data frame with the baseline interval for each feature, and a "Support" binary matrix of the same dimensions as Mat indicating whether each sample was a support or a feature or not (1=support, 0=not in the support), gamma: selected gamma value alpha: the expected number of divergent features per sample computed over the baseline data matrix
baseMat = breastTCGA_Mat[, breastTCGA_Group == "NORMAL"] dataMat = breastTCGA_Mat[, breastTCGA_Group != "NORMAL"] seMat.base = SummarizedExperiment(assays=list(data=baseMat)) seMat = SummarizedExperiment(assays=list(data=dataMat)) div = computeMultivariateDigitization( seMat = seMat, seMat.base = seMat.base, FeatureSets = msigdb_Hallmarks )