This function performs feature normalization according to user- specified parameters.
normalize.feat(siamcat, norm.method = c("rank.unit", "rank.std", "log.std", "log.unit", "clr"), norm.param = list(log.n0 = 1e-08, sd.min.q = 0.1, n.p = 2, norm.margin = 1), verbose = 1)
siamcat | an object of class siamcat |
---|---|
norm.method | string, normalization method, can be one of these:
' |
norm.param | list, specifying the parameters of the different normalization methods, see details for more information |
verbose | control output: |
an object of class siamcat
There are five different normalization methods available:
"rank.unit"
converts features to ranks and normalizes each
column (=sample) by the square root of the sum of ranks
"rank.std"
converts features to ranks and applies z-score
standardization
"clr"
centered log-ratio transformation (with the addition of
pseudocounts)
"log.std"
log-transforms features (after addition of
pseudocounts) and applies z-score standardization
"log.unit"
log-transforms features (after addition of
pseudocounts) and normalizes by features or samples with different norms
The list entries in "norm.param"
specify the normalzation parameters,
which are dependant on the normalization method of choice:
"rank.unit"
does not require any other parameters
"rank.std"
requires sd.min.q
, quantile of the
distribution of standard deviations of all features that will be added
to the denominator during standardization in order to avoid
underestimation of the standard deviation, defaults to 0.1
"clr"
requires log.n0
, which is the pseudocount to be
added before log-transformation, defaults to NULL
leading to the
estimation of log.n0
from the data
"log.std"
requires both log.n0
and sd.min.q
, using
the same default values
"log.unit"
requires next to log.n0
also the parameters
n.p
and norm.margin
. n.p
specifies the vector norm
to be used, can be either 1
for x/sum(x)
or 2
for
x/sqrt(sum(x^2))
. The parameter norm.margin
specifies the
margin over which to normalize, similarly to the apply
-syntax:
Allowed values are 1
for normalization over features, 2
over samples, and 3
for normalization by the global maximum.
The function allows to perform a frozen normalization on a different dataset.
After normalizing the first dataset, the output list $par
contains all
parameters of the normalization. Supplying this list together with a new dataset
will normalize the second dataset in a comparable way to the first dataset (e.g.
by using the same mean for the features for z-score standardization)