5.LinearModels             package:limma             R Documentation

_L_i_n_e_a_r _M_o_d_e_l_s _f_o_r _M_i_c_r_o_a_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     This page gives an overview of the LIMMA functions available to
     fit linear models and to interpret the results.

     The core of this package is the fitting of gene-wise linear models
     to microarray data. The basic idea is to estimate log-ratios
     between two or more target RNA samples simultaneously. See the
     _LIMMA User's Guide_ for several case studies.

_F_o_r_m_i_n_g _t_h_e _D_e_s_i_g_n _M_a_t_r_i_x:

     The function 'designMatrix' is provided to assist with creation of
     an appropriate design matrix for two-color microarray experiments
     using a common reference. Design matrices for Affymetrix or
     single-color arrays can be easily created using the ordinary R
     command 'model.matrix'. For the direct two-color designs the
     design matrix needs to be created by hand.

_F_i_t_t_i_n_g _M_o_d_e_l_s:

     There are four functions in the package which fit linear models:

      '_l_m_F_i_t'  This is a high level function which accepts objects and
          provides an entry point to the following three functions.

      '_l_m._s_e_r_i_e_s'  Straightforward least squares fitting of a linear
          model for each gene.

      '_r_l_m._s_e_r_i_e_s'  An alternative to 'lm.series' using robust
          regression as implemented by the 'rlm' function in the MASS
          package.

      '_g_l_s._s_e_r_i_e_s'  Generalized least squares taking into account
          correlations between duplicate spots (i.e., replicate spots
          on the same array). The functions 'duplicateCorrelation' or
          'dupcor.series' are used to estimate the inter-duplicate
          correlation before using 'gls.series'.

     Each of these functions accepts essentially the same argument list
     and produces a fitted model object of the same form. The first
     function 'lmFit' formally produces an object of class 'MArrayLM'.
     The other three functions are lower level functions which produce
     similar output but in unclassed lists.

     The main argument is the *design matrix* which specifies which
     target RNA samples were applied to each channel on each array.
     There is considerable freedom to choose the design matrix - there
     is always more than one choice which is correct provided it is
     interpreted correctly. The fitted model object consists of
     coefficients, standard errors and residual standard errors for
     each gene.

     All the functions which fit linear models use 'unwrapdups' which
     provides an unified method for handling duplicate spots.

_M_a_k_i_n_g _C_o_m_p_a_r_i_s_o_n_s _o_f _I_n_t_e_r_e_s_t:

     Once a linear model has been fit using an appropriate design
     matrix, the command 'makeContrasts' may be used to form a contrast
     matrix to make comparisons of interest. The fit and the contrast
     matrix are used by 'contrasts.fit' to compute fold changes and
     t-statistics for the contrasts of interest. This is a way to
     compute all possible pairwise comparisons between treatments for
     example in an experiment which compares many treatments to a
     common reference.

_A_s_s_e_s_s_i_n_g _D_i_f_f_e_r_e_n_t_i_a_l _E_x_p_r_e_s_s_i_o_n:

     After fitting a linear model, the standard errors are moderated
     using a simple empirical Bayes model using 'ebayes' or 'eBayes'. A
     moderated t-statistic and a log-odds of differential expression is
     computed for each contrast for each gene.

     'ebayes' and 'eBayes' use internal functions 'fitFDist',
     'tmixture.matrix' and 'tmixture.vector'.

     The function 'zscoreT' is sometimes used for computing z-score
     equivalents for t-statistics so as to place t-statistics with
     different degrees of freedom on the same scale. 'zscoreGamma' is
     used the same way with standard deviations instead of
     t-statistics. These functions are for research purposes rather
     than for routine use.

_S_u_m_m_a_r_i_z_i_n_g _M_o_d_e_l _F_i_t_s:

     After the above steps the results may be displayed or further
     processed using:

      '_t_o_p_t_a_b_l_e'  Presents a list of the genes most likely to be
          differentially expressed for a given contrast.

      '_c_l_a_s_s_i_f_y_T_e_s_t_s'  Uses nested F-tests to classify the genes as up,
          down or even over the contrasts in the linear model with
          special attention to genes which are significant in more than
          one contrast. 'classifyTestsT' and 'classifyTestsP' are
          simpler methods using cutoffs for the t-statistics or
          p-values individually.

      '_h_e_a_t_d_i_a_g_r_a_m'  Allows visual comparison of the results across
          many different conditions in the linear model. Not the same
          as heatdiagrams produced by other packages! This function
          accepts a classification matrix produced by 'classifyTests'.

      '_v_e_n_n_C_o_u_n_t_s'  Accepts output from 'classifyTests' and counts the
          number of genes in each classification.

      '_v_e_n_n_D_i_a_g_r_a_m'  Accepts output from 'classifyTests' or
          'vennCounts' and produces a Venn diagram plot.

_A_u_t_h_o_r(_s):

     Gordon Smyth

_R_e_f_e_r_e_n_c_e_s:

     Smyth, G. K. (2003). Linear models and empirical Bayes methods for
     assessing differential expression in microarray experiments. <URL:
     http://www.statsci.org/smyth/pubs/ebayes.pdf>

     Smyth, G. K., Michaud, J., and Scott, H. (2003). The use of
     within-array duplicate spots for assessing differential expression
     in microarray experiments. <URL:
     http://www.statsci.org/smyth/pubs/dupcor.pdf>

