5.LinearModels             package:limma             R Documentation

_L_i_n_e_a_r _M_o_d_e_l_s _f_o_r _M_i_c_r_o_a_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     This page gives an overview of the LIMMA functions available to
     fit linear models and to interpret the results.

     The core of this package is the fitting of gene-wise linear models
     to microarray data. The basic idea is to estimate log-ratios
     between two or more target RNA samples simultaneously. See the
     _LIMMA User's Guide_ for several case studies.

_F_o_r_m_i_n_g _t_h_e _D_e_s_i_g_n _M_a_t_r_i_x:

     The function 'modelMatrix' is provided to assist with creation of
     an appropriate design matrix for two-color microarray experiments
     using a common reference. Design matrices for Affymetrix or
     single-color arrays can be easily created using the function
     'model.matrix' which is part of the R base package. For the direct
     two-color designs the design matrix often needs to be created by
     hand.

_F_i_t_t_i_n_g _M_o_d_e_l_s:

     There are four main functions in the package which fit linear
     models:

      '_l_m_F_i_t'  This is a high level function which accepts objects and
          provides an entry point to the following three functions.

      '_l_m._s_e_r_i_e_s'  Straightforward least squares fitting of a linear
          model for each gene.

      '_r_l_m._s_e_r_i_e_s'  An alternative to 'lm.series' using robust
          regression as implemented by the 'rlm' function in the MASS
          package.

      '_g_l_s._s_e_r_i_e_s'  Generalized least squares taking into account
          correlations between duplicate spots (i.e., replicate spots
          on the same array) or between technical replicates. The
          function 'duplicateCorrelation' is used to estimate the
          inter-duplicate correlation before using 'gls.series'.

     Each of these functions accepts essentially the same argument list
     and produces a fitted model object of the same form. The first
     function 'lmFit' formally produces an object of class 'MArrayLM'.
     The other three functions are lower level functions which produce
     similar output but in unclassed lists.

     The main argument is the *design matrix* which specifies which
     target RNA samples were applied to each channel on each array.
     There is considerable freedom to choose the design matrix - there
     is always more than one choice which is correct provided it is
     interpreted correctly. The fitted model object consists of
     coefficients, standard errors and residual standard errors for
     each gene.

     All the functions which fit linear models use 'unwrapdups' which
     provides an unified method for handling duplicate spots.

     All the above linear modeling functions accept two-color data in
     terms of log-ratios. See 6.SingleChannel for the modeling of
     two-color data in terms of the individual log-intensities.

_M_a_k_i_n_g _C_o_m_p_a_r_i_s_o_n_s _o_f _I_n_t_e_r_e_s_t:

     Once a linear model has been fit using an appropriate design
     matrix, the command 'makeContrasts' may be used to form a contrast
     matrix to make comparisons of interest. The fit and the contrast
     matrix are used by 'contrasts.fit' to compute fold changes and
     t-statistics for the contrasts of interest. This is a way to
     compute all possible pairwise comparisons between treatments for
     example in an experiment which compares many treatments to a
     common reference.

_A_s_s_e_s_s_i_n_g _D_i_f_f_e_r_e_n_t_i_a_l _E_x_p_r_e_s_s_i_o_n:

     After fitting a linear model, the standard errors are moderated
     using a simple empirical Bayes model using 'ebayes' or 'eBayes'. A
     moderated t-statistic and a log-odds of differential expression is
     computed for each contrast for each gene.

     'ebayes' and 'eBayes' use internal functions 'fitFDist',
     'tmixture.matrix' and 'tmixture.vector'.

     The function 'zscoreT' is sometimes used for computing z-score
     equivalents for t-statistics so as to place t-statistics with
     different degrees of freedom on the same scale. 'zscoreGamma' is
     used the same way with standard deviations instead of
     t-statistics. These functions are for research purposes rather
     than for routine use.

_S_u_m_m_a_r_i_z_i_n_g _M_o_d_e_l _F_i_t_s:

     After the above steps the results may be displayed or further
     processed using:

      '_t_o_p_t_a_b_l_e' _o_r '_t_o_p_T_a_b_l_e'  Presents a list of the genes most
          likely to be differentially expressed for a given contrast.

      '_c_l_a_s_s_i_f_y_T_e_s_t_s_F'  Uses nested F-tests to classify the genes as
          up, down or even over the contrasts in the linear model with
          special attention to genes which are significant in more than
          one contrast. 'classifyTestsT' and 'classifyTestsP' are
          simpler methods using cutoffs for the t-statistics or
          p-values individually.

      '_F_S_t_a_t'  Computes an overall moderated F-statistic to test
          whether all the contrasts are equal to zero.

      '_h_e_a_t_d_i_a_g_r_a_m' _o_r '_h_e_a_t_D_i_a_g_r_a_m'  Allows visual comparison of the
          results across many different conditions in the linear model.
          Not the same as heatdiagrams produced by other packages! This
          function accepts a 'TestResults' matrix produced by
          'classifyTests'.

      '_v_e_n_n_C_o_u_n_t_s'  Accepts output from 'classifyTests' and counts the
          number of genes in each classification.

      '_v_e_n_n_D_i_a_g_r_a_m'  Accepts output from 'classifyTests' or
          'vennCounts' and produces a Venn diagram plot.

      '_w_r_i_t_e._f_i_t'  Writes an 'MarrayLM' object to a file.

     When evaluating test procedures with simulated or known results,
     the utility function 'auROC' can be used to compute the area under
     the Receiver Operating Curve for the test results for a given
     probe.

_A_u_t_h_o_r(_s):

     Gordon Smyth

_R_e_f_e_r_e_n_c_e_s:

     Smyth, G. K. (2004). Linear models and empirical Bayes methods for
     assessing differential expression in microarray experiments.
     _Statistical Applications in Genetics and Molecular Biology_, *3*,
     No. 1, Article 3. <URL:
     http://www.bepress.com/sagmb/vol3/iss1/art3>

     Smyth, G. K., Michaud, J., and Scott, H. (2003). The use of
     within-array duplicate spots for assessing differential expression
     in microarray experiments. <URL:
     http://www.statsci.org/smyth/pubs/dupcor.pdf>

