gfp {yeastExpData}R Documentation

Yeast GFP Fusion Data

Description

This data frame contains data concerning the localization and abundance of various yeast proteins.

Usage

data(gfp)

Format

A data frame with 6234 observations on the following 33 variables.

orfid
a numeric vector of identifiers
yORF
a factor representing yeast ORF names, with levels YAL001C, YAL002W, etc. These are also the row names of the data frame.
gene_name
a factor representing corresponding yeast gene names, with levels AAC1, AAC3, etc.
GFP_tagged
a factor with levels not tagged and tagged, indicating whether or not the ORF was GFP tagged
GFP_visualized
a factor with levels not visualized and visualized, indicating whether or not GFP fluoresence was visualized
TAP_visualized
a factor with levels TAP visualized and not TAP visualized, indicating success of TAP tag
abundance
a numeric vector, giving estimated abundance in units of molecules per cell
error
a numeric vector of estimated errors in abundance for a subset of proteins, in the same units as abundance (see details below)
localization_summary
a factor with levels , ER, ER to Golgi, ER,ambiguous, ER,ambiguous,bud, etc. Summarizes the information contained in the subsequent columns.

The following columns indicate whether or not the protein was localized in the specific region of the cell. A protein can be localized in more than one region.

ambiguous
a logical vector
mitochondrion
a logical vector
vacuole
a logical vector
spindle_pole
a logical vector
cell_periphery
a logical vector
punctate_composite
a logical vector
vacuolar_membrane
a logical vector
ER
a logical vector
nuclear_periphery
a logical vector
endosome
a logical vector
bud_neck
a logical vector
microtubule
a logical vector
Golgi
a logical vector
late_Golgi
a logical vector
peroxisome
a logical vector
actin
a logical vector
nucleolus
a logical vector
cytoplasm
a logical vector
ER_to_Golgi
a logical vector
early_Golgi
a logical vector
lipid_particle
a logical vector
nucleus
a logical vector
bud
a logical vector

Explanation for missing abundance values are given by

missingAbundance
a factor with levels low signal, not visualized and technical problem

Details

The information on abundance is available in three columns. abundance gives (where available) absolute protein abundances determined by quantitative Western blot analysis of TAP-tagged strains. Abundances that have a non-NA error value were done in triplicate with serial dilutions of purified TAP-tagged standards included in each gel, which substantially reduces the measurement error. In addition, for these strains, the tagged genes were confirmed to rescue the loss of function phenotype of the corresponding deletion strain. For rows where abundance is missing (NA), the missingAbundance column gives the reason. Possible reasons are:

"not visualized"
Either the tagging was unsuccessful or no signal was detected.
"low signal"
The tagging was successful, but the signal was not sufficiently high above background to permit accurate quantitation (about 50 molecules/cell).
"technical problem"
The protein was detectable but could not be quantitated because it did not migrate as a single band or comigrated with the internal standards in the gel.

Replicate analysis for a subset of tagged strains found a linear correlation coefficient of R = 0.94, with the pairs of proteins having a median variation of a factor of 2.0. This error analysis does not account for potential alterations in the endogenous levels of the proteins caused by the the fused tag, which may be particularly disruptive for small proteins.

Source

The data were obtained from http://yeastgfp.ucsf.edu/, which contains a lot more information as well as raw image data. This data frame was specifically generated from http://yeastgfp.ucsf.edu/allOrfData.txt

References

For the Localization data: Huh, et al., Nature 425, 686-691 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562095&dopt=Abstract

For the Protein abundance data: Ghaemmaghami, et al., Nature 425, 737-741 (2003) – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14562106&dopt=Abstract

Examples

data(gfp)
keep <- names(which(table(gfp$localization_summary) > 50))

if (require(lattice)) {
  bwplot(reorder(localization_summary, abundance, median, na.rm = TRUE) ~ log2(abundance), gfp,
         varwidth = TRUE,
         subset = localization_summary %in% keep)
} else {

  opar <- par(las = 2, mar = par("mar") + c(3.5, 0, 0, 0))
  gfp._sub <- subset(gfp, localization_summary %in% keep)
  gfp._sub$localization_summary <- gfp._sub$localization_summary[, drop = TRUE]
  boxplot(log2(abundance) ~ reorder(localization_summary, abundance, median, na.rm = TRUE), 
          data = gfp._sub, varwidth = TRUE)
  rm(gfp._sub)
  par(opar)

}


[Package yeastExpData version 0.9.13 Index]