RangedData-class           package:IRanges           R Documentation

_D_a_t_a _o_n _r_a_n_g_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'RangedData' supports storing data, i.e. a set of variables, on a
     set of ranges spanning multiple spaces (e.g. chromosomes).
     Although the data is split across spaces, it can still be treated
     as one cohesive dataset when desired. In order to handle large
     datasets, the data values are stored externally to avoid copying,
     and the 'rdapply' function facilitates the processing of each
     space separately (divide and conquer).

_D_e_t_a_i_l_s:

     A 'RangedData' object consists of two primary components: a
     'RangesList' holding the ranges over multiple spaces and a
     parallel 'SplitXDataFrame', holding the split data. There is also
     an 'annotation' slot for denoting the source (e.g. the genome) of
     the ranges and/or data.

     There are two different modes of interacting with a 'RangedData'.
     The first mode treats the object as a contiguous "data frame"
     annotated with range information. The accessors 'start', 'end',
     and 'width' get the corresponding fields in the ranges as atomic
     integer vectors, undoing the division over the spaces. The '[['
     and matrix-style '[,' extraction and subsetting functions unroll
     the data in the same way. '[[<-' does the inverse. The number of
     rows is defined as the total number of ranges and the number of
     columns is the number of variables in the data. It is often
     convenient and natural to treat the data this way, at least when
     the data is small and there is no need to distinguish the ranges
     by their space.

     The other mode is to treat the 'RangedData' as a list, with an
     element (a virtual 'Ranges'/'XDataFrame' pair) for each space. The
     length of the object is defined as the number of spaces and the
     value returned by the 'names' accessor gives the names of the
     spaces. The list-style '[' subset function behaves analogously.
     The 'rdapply' function provides a convenient and formal means of
     applying an operation over the spaces separately. This mode is
     helpful when ranges from different spaces must be treated
     separately or when the data is too large to process over all
     spaces at once.

_A_c_c_e_s_o_r _m_e_t_h_o_d_s:

     In the code snippets below, 'x' is a 'RangedData' object.

     The following accessors treat the data as a contiguous dataset,
     ignoring the division into spaces:

      Array accessors:

           'nrow(x)': The number of ranges in 'x'.

           'ncol(x)': The number of data variables in 'x'.

           'dim(x)': An integer vector of length two, essentially
               'c(nrow(x), ncol(x))'.

           'rownames(x)': Gets the names of the ranges in 'x'.

           'colnames(x)': Gets the names of the variables in 'x'.

           'dimnames(x)': A list with two elements, essentially
               'list(rownames(x), colnames(x))'.


      Range accessors. The type of the return value depends on the type
          of 'Ranges'. For 'IRanges', an integer vector. Regardless,
          the number of elements is always equal to 'nrow(x)'.

           'start(x)': The start value of each range.

           'width(x)': The width of each range.

           'end(x)': The end value of each range.


     These accessors make the object seem like a list along the spaces:

      'length(x)': The number of spaces (e.g. chromosomes) in 'x'.

      'names(x)': The names of the spaces (e.g. '"chr1"'). 'NULL' or a
          character vector of the same length as 'x'.

      'names(x) <- value': Set the names of the spaces, where 'value'
          is either 'NULL' or a character vector of the same length as
          'x'. 


     Other accessors:

      'annotation(object)': Here, 'object' is a 'RangedData' object.
          Get the scalar string identifying the source of the data in
          some way (e.g. genome, experimental platform, etc). 

      'ranges(x)': Gets the ranges in 'x' as a 'RangesList'.

      'values(x)': Gets the data values in 'x' as a 'SplitXDataFrame'.


_C_o_n_s_t_r_u_c_t_o_r:


      'RangedData(ranges = IRanges(), ..., splitter = NULL, annotation
          = NULL)': Creates a 'RangedData' with the ranges in 'ranges'
          and variables given by the arguments in '...'.  See the
          constructor 'XDataFrame' for how the '...' arguments are
          interpreted. If 'splitter' is 'NULL', all of the ranges and
          values are placed into the same space, resulting in a
          single-space (length one) 'RangedData'. Otherwise, the ranges
          and values are split into spaces according to 'splitter',
          which is treated as a factor, like the 'f' argument in
          'split'. The annotation may be specified as a scalar string
          by the 'annotation' argument.


_C_o_e_r_c_i_o_n:


      'as.data.frame(x, row.names=NULL, optional=FALSE, ...)': Copy the
          start, end, width of the ranges and all of the variables as
          columns in a 'data.frame'. This is a bridge to existing
          functionality in R, but of course care must be taken if the
          data is large. Note that 'optional' and '...' are ignored.

      'as(from, "XDataFrame")': Like 'as.data.frame' above, except the
          result is an 'XDataFrame' and it probably involves less
          copying, especially if there is only a single space.

      'as(from, "RangedData")': coerces 'from' to a 'RangedData',
          according to its class:

          _X_R_l_e The bounds of the runs become the ranges and the values
               become a column named 'score'.


_S_u_b_s_e_t_t_i_n_g _a_n_d _R_e_p_l_a_c_e_m_e_n_t:

     In the code snippets below, 'x' is a 'RangedData' object.


      'x[i]': Subsets 'x' by indexing into its spaces, so the result is
          of the same class, with a different set of spaces. 'i' can be
          numerical, logical, 'NULL' or missing.

      'x[i,j]': Subsets 'x' by indexing into its rows and columns. The
          result is of the same class, with a different set of rows and
          columns. Note that this differs from the subset form above,
          because we are now treating 'x' as one contiguous dataset.

      'x[[i]]': Extracts a variable from 'x', where 'i' can be a
          character, numeric, or logical scalar that indexes into the
          columns. The variable is unlisted over the spaces.

      'x[[i]] <- value': Sets value as column 'i' in 'x', where 'i' can
          be a character, numeric, or logical scalar that indexes into
          the columns. The length of 'value' should equal 'nrow(x)'.
          'x[[i]]' should be identical to 'value' after this operation.


_S_p_l_i_t_t_i_n_g _a_n_d _C_o_m_b_i_n_i_n_g:

     In the code snippets below, 'x' is a 'RangedData' object.


      'split(x, f, drop = FALSE)': Split 'x' according to 'f', which
          should be of length equal to 'nrow(x)'. Note that 'drop' is
          ignored here. The result is a 'RangedDataList' where every
          element has the same  length (number of spaces) but different
          sets of ranges within each space.

      'c(x, ..., recursive = FALSE)': Combines 'x' with arguments
          specified in '...', which must all be 'RangedData' instances.
          This combination acts as if 'x' is a list of spaces, meaning
          that the result will contain the spaces of the first
          concatenated with the spaces of the second, and so on. This
          function is useful when creating 'RangedData' instances on a
          space-by-space basis and then needing to combine them.


_A_u_t_h_o_r(_s):

     Michael Lawrence

_S_e_e _A_l_s_o:

     RangedData-utils for utlities and the 'rdapply' function for
     applying a function to each space separately.

_E_x_a_m_p_l_e_s:

       ranges <- IRanges(c(1,2,3),c(4,5,6))
       filter <- c(1L, 0L, 1L)
       score <- c(10L, 2L, NA)

       ## constructing RangedData instances

       ## no variables
       rd <- RangedData()
       rd <- RangedData(ranges)
       ranges(rd)
       ## one variable
       rd <- RangedData(ranges, score)
       rd[["score"]]
       ## multiple variables
       rd <- RangedData(ranges, filter, vals = score)
       rd[["vals"]] # same as rd[["score"]] above
       rd[["filter"]]
       rd <- RangedData(ranges, score + score)
       rd[["score...score"]] # names made valid
       ## use an annotation
       rd <- RangedData(ranges, annotation = "hg18")
       annotation(rd)

       ## split some data over chromosomes

       range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
       both <- c(ranges, range2)
       score <- c(score, c(0L, 3L, NA, 22L))
       filter <- c(filter, c(0L, 1L, NA, 0L)) 
       chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")

       rd <- RangedData(both, score, filter, splitter = chrom, annotation = "hg18")
       rd[["score"]] # identical to score
       rd[1][["score"]] # identical to score[1:3]

       ## subsetting

       ## list style: [i]

       rd[numeric()] # these three are all empty
       rd[logical()]
       rd[NULL]
       rd[] # missing, full instance returned
       rd[FALSE] # logical, supports recycling
       rd[c(FALSE, FALSE)] # same as above
       rd[TRUE] # like rd[]
       rd[c(TRUE, FALSE)]
       rd[1] # numeric index
       rd[c(1,2)]
       rd[-2]

       ## matrix style: [i,j]

       rd[,NULL] # no columns
       rd[NULL,] # no rows
       rd[,1]
       rd[,1:2]
       rd[,"filter"]
       rd[1,] # now by the rows
       rd[c(1,3),]
       rd[1:2, 1] # row and column
       rd[c(1:2,1,3),1] ## repeating rows

       ## variable replacement

       count <- c(1L, 0L, 2L)
       rd <- RangedData(ranges, count, splitter = c(1, 2, 1))
       ## adding a variable
       score <- c(10L, 2L, NA)
       rd[["score"]] <- score
       rd[["score"]] # same as 'score'
       ## replacing a variable
       count2 <- c(1L, 1L, 0L)
       rd[["count"]] <- count2
       ## numeric index also supported
       rd[[2]] <- score
       rd[[2]] # gets 'score'
       ## removing a variable
       rd[[2]] <- NULL
       ncol(rd) # is only 1

       ## combining/splitting

       rd <- RangedData(ranges, score, splitter = c(1, 2, 1))
       c(rd[1], rd[2]) # equal to 'rd'
       rd2 <- RangedData(ranges, score)
       unlist(split(rd2, c(1, 2, 1))) # same as 'rd'

