edgeRselection {ClassifyR}R Documentation

Feature Selection Based on Differential Expression for Count Data

Description

Performs a differential expression analysis between classes and chooses the features which have best resubstitution performance. The data may have overdispersion and this is modelled.

Usage

  ## S4 method for signature 'matrix'
edgeRselection(counts, classes, ...)
  ## S4 method for signature 'DataFrame'
edgeRselection(counts, classes, datasetName,
                   normFactorsOptions = NULL, dispOptions = NULL, fitOptions = NULL,
               trainParams, predictParams, resubstituteParams,
               selectionName = "edgeR LRT", verbose = 3)
  ## S4 method for signature 'MultiAssayExperiment'
edgeRselection(counts, targets = NULL, ...)

Arguments

counts

Either a matrix or MultiAssayExperiment containing the unnormalised counts.

classes

A vector of class labels of class factor of the same length as the number of samples in measurements. Not used if measurements is a MultiAssayExperiment object.

targets

If measurements is a MultiAssayExperiment, the names of the data tables of counts to be used.

...

Variables not used by the matrix nor the MultiAssayExperiment method which are passed into and used by the DataFrame method.

datasetName

A name for the data set used. Stored in the result.

normFactorsOptions

A named list of any options to be passed to calcNormFactors.

dispOptions

A named list of any options to be passed to estimateDisp.

fitOptions

A named list of any options to be passed to glmFit.

trainParams

A container of class TrainParams describing the classifier to use for training.

predictParams

A container of class PredictParams describing how prediction is to be done.

resubstituteParams

An object of class ResubstituteParams describing the performance measure to consider and the numbers of top features to try for resubstitution classification.

selectionName

A name to identify this selection method by. Stored in the result.

verbose

Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Details

The differential expression analysis follows the standard edgeR steps of estimating library size normalisation factors, calculating dispersion, in this case robustly, and then fitting a generalised linear model followed by a likelihood ratio test.

Data tables which consist entirely of non-numeric data cannot be analysed. If measurements is an object of class MultiAssayExperiment, the factor of sample classes must be stored in the DataFrame accessible by the colData function with column name "class".

Value

An object of class SelectResult or a list of such objects, if the classifier which was used for determining the specified performance metric made a number of prediction varieties.

Author(s)

Dario Strbenac

References

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Mark D. Robinson, Davis McCarthy, and Gordon Smyth, 2010, Bioinformatics, Volume 26 Issue 1, https://academic.oup.com/bioinformatics/article/26/1/139/182458.

Examples

  if(require(parathyroidSE) && require(PoiClaClu))
  {
    data(parathyroidGenesSE)
    expression <- assays(parathyroidGenesSE)[[1]]
    sampleNames <- paste("Sample", 1:ncol(parathyroidGenesSE))
    colnames(expression) <- sampleNames
    DPN <- which(colData(parathyroidGenesSE)[, "treatment"] == "DPN")
    control <- which(colData(parathyroidGenesSE)[, "treatment"] == "Control")
    expression <- expression[, c(control, DPN)]
    classes <- factor(rep(c("Contol", "DPN"), c(length(control), length(DPN))))
    expression <- expression[rowSums(expression > 1000) > 8, ] # Make small data set.
    
    selected <- edgeRselection(expression, classes, "DPN Treatment",
                   trainParams = TrainParams(classifyInterface),
                   predictParams = PredictParams(NULL),
                   resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10),
                                        performanceType = "balanced error", better = "lower"))
                                        
    head(selected@rankedFeatures[[1]])
    plotFeatureClasses(expression, classes, "ENSG00000044574",
                       dotBinWidth = 500, xAxisLabel = "Unnormalised Counts")
  }

[Package ClassifyR version 2.12.0 Index]