vsn                   package:vsn                   R Documentation

_V_a_r_i_a_n_c_e _s_t_a_b_i_l_i_z_a_t_i_o_n _a_n_d _c_a_l_i_b_r_a_t_i_o_n _f_o_r _m_i_c_r_o_a_r_r_a_y _d_a_t_a.

_D_e_s_c_r_i_p_t_i_o_n:

     Robust estimation of variance-stabilizing and calibrating 
     transformations for microarray data. This is the main function of
     this package; see also the vignette vsn.pdf.

_U_s_a_g_e:

     vsn(intensities,
         lts.quantile = 0.5,
         verbose      = TRUE,
         niter        = 10,
         cvg.check    = NULL,
         pstart       = NULL,
         describe.preprocessing = TRUE)

_A_r_g_u_m_e_n_t_s:

intensities: An object that contains intensity values from a microarray
          experiment. See 'getIntensityMatrix' for details. The
          intensities are assumed to be the raw scanner data,
          summarized over the spots by an image analysis program, and
          possibly "background" subtracted. The intensities must not be
          logarithmically or otherwise transformed, and not thresholded
          or "floored". NAs are not accepted. See details.

lts.quantile: Numeric. The quantile that is used for the resistant
          least trimmed sum of squares regression. Allowed values are
          between 0.5 and 1, corresponding to least median sum of
          squares regression, and to ordinary least sum of squares
          regression, respectively.

   niter: Integer. The number of iterations to be used in the least
          trimmed sum of squares regression.

 verbose: Logical. If TRUE, some messages are printed.

  pstart: Numeric vector. Starting values for the model parameters in
          the iterative parameter estimation algorithm. If NULL, the
          function tries to determine reasonable starting values from
          the distribution of 'intensities'.

describe.preprocessing: Logical. If TRUE, calibration and
          transformation parameters, plus some other information are
          stored in the 'preprocessing' slot of the returned object.
          See details.

cvg.check: List. If non-NULL, this allows finer control of the
          iterative least trimmed sum of squares regression. See
          details.

_D_e_t_a_i_l_s:

     The function calibrates for sample-to-sample variations through
     shifting and scaling, and transforms the intensities to a scale
     where the variance is approximately independent of the mean
     intensity. The variance stabilizing transformation is equivalent
     to the natural logarithm in the high-intensity range, and to a
     linear transformation in the low-intensity range. In an
     intermediate range, the _arsinh_ function interpolates smoothly
     between the two. The calibration consists of estimating an offset
     'offs[i]' and a scale factor 'fac[i]' for each column 'i' of the
     matrix 'intensities'. Thus, the calibration is:

     'intensities[k,i] <- intensities[k,i] * fac[i] + offs[i]'

     The parameters 'offs[i]' and 'fac[i]' are estimated through a
     robust variant of maximum likelihood. The model assumes that for
     the majority of genes the expression levels are not much different
     across the samples, i.e., that only a minority of genes (less than
     a fraction of 'lts.quantile') is differentially expressed.

     *Format:* The format of the matrix of intensities is as follows:
     for the *two-color printed array technology*, each row corresponds
     to one spot, and the columns to the different arrays and
     wave-lengths (usually red and green, but could be any number). For
     example, if there are 10 arrays, the matrix would have 20 columns,
     columns 1...10 containing the green intensities, and 11...20 the
     red ones. In fact, the ordering of the columns does not matter to
     'vsn', but it is your responsibility to keep track of it for
     subsequent analyses. For *one-color arrays*, each row corresponds
     to a probe, and each column to an array.

     *Performance:* This function is slow. That is due to the nested
     iteration loops of the numerical optimization of the likelihood
     function and the heuristic that identifies the non-outlying data
     points in the least trimmed squares regression. For large arrays
     with many tens of thousands of probes, you may want to consider
     random subsetting: that is, only use a subset of the e.g.
     10-20,000 rows of the data matrix 'intensities' to fit the
     parameters, then apply the transformation to all the data, using
     'vsnh'. An example for this can be seen in the function
     'normalize.AffyBatch.vsn', whose code you can inspect by typing
     'normalize.AffyBatch.vsn' on the R command line.

     *Calibration and transformation parameters:* The parameters are
     stored in the 'preprocessing' slot of the 'description' slot of
     the 'exprSet' object that is returned, in the form of a 'list'
     with three elements

        *  'vsnParams': a length(2*d) numeric vector of parameters 

        *  'vsnParamsIter': an (2*d) x niter numeric matrix that
           contains the parameter trajectory during the iterative fit
           process (see 'vsnPlotPar').

        *  'vsnTrimSelection': a length(n) logical vector that for each
           row of the intensities matrix reports whether it was below
           (TRUE) or above (FALSE) the trimming threshold.

     If 'intensities' has class 'exprSet', and its 'description'  slot
     has class 'MIAME', then this list is appended to any existing
     entries in the 'preprocessing' slot. Otherwise, the 'description'
     object and its 'preprocessing' slot are created.

     By default, if 'cvg.check' is 'NULL', the function will run the
     fixed number 'niter' of iterations in the least trimmed sum of
     squares regression. More fine-grained control can be obtained by
     passing a list with elements 'eps' and 'n'. If the maximum change
     between transformed data values is smaller than 'eps' for 'n'
     subsequent iterations, then the iteration terminates.

_V_a_l_u_e:

     An object of class 'exprSet'. Differences between the columns of
     the transformed intensities may be interpreted as "regularized" or
     "shrunken" log-ratios. For the calibration and transformation
     parameters, see the _Details_ section.

_A_u_t_h_o_r(_s):

     Wolfgang Huber <URL: http://www.dkfz.de/mga/whuber>

_R_e_f_e_r_e_n_c_e_s:

     Variance stabilization applied to microarray data calibration and
     to the quantification of differential expression, Wolfgang Huber,
     Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin
     Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

     Parameter estimation for the calibration and variance
     stabilization  of microarray data,  Wolfgang Huber, Anja von
     Heydebreck, Holger Sueltmann,  Annemarie Poustka, and Martin
     Vingron;   Statistical Applications in Genetics and Molecular
     Biology (2003) Vol. 2 No. 1, Article 3.
     http://www.bepress.com/sagmb/vol2/iss1/art3.

_S_e_e _A_l_s_o:

     'exprSet-class', 'MIAME-class', 'normalize.AffyBatch.vsn'

_E_x_a_m_p_l_e_s:

     data(kidney)

     if(interactive()) {
       x11(width=9, height=4.5)
       par(mfrow=c(1,2))
     }
     plot(log.na(exprs(kidney)), pch=".", main="log-log")

     vsnkid = vsn(kidney)   ## transform and calibrate
     plot(exprs(vsnkid), pch=".", main="h-h")

     if (interactive()) {
       x11(width=9, height=4)
       par(mfrow=c(1,3))
     }

     meanSdPlot(vsnkid)
     vsnPlotPar(vsnkid, "factors")
     vsnPlotPar(vsnkid, "offsets")

     ## this should always hold true
     params = preproc(description(vsnkid))$vsnParams
     stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid))) 

