xCrosstalk | R Documentation |
xCrosstalkGenes
is supposed to identify maximum-scoring pathway
crosstalk from an input graph with the node information on the
significance (measured as p-values or fdr). It returns an object of
class "cPath".
xCrosstalk(data, entity = c("Gene", "GR"), significance.threshold = NULL, score.cap = NULL, build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"), crosslink = c("genehancer", "PCHiC_combined", "GTEx_V6p_combined", "nearby"), crosslink.customised = NULL, cdf.function = c("original", "empirical"), scoring.scheme = c("max", "sum", "sequential"), nearby.distance.max = 50000, nearby.decay.kernel = c("rapid", "slow", "linear", "constant"), nearby.decay.exponent = 2, networks = c("KEGG", "KEGG_metabolism", "KEGG_genetic", "KEGG_environmental", "KEGG_cellular", "KEGG_organismal", "KEGG_disease"), seed.genes = T, subnet.significance = 0.01, subnet.size = NULL, ontologies = c("KEGGenvironmental", "KEGG", "KEGGmetabolism", "KEGGgenetic", "KEGGcellular", "KEGGorganismal", "KEGGdisease"), size.range = c(10, 2000), min.overlap = 10, fdr.cutoff = 0.05, crosstalk.top = NULL, glayout = layout_with_kk, verbose = T, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
data |
a named input vector containing the significance level for genes (gene symbols) or genomic regions (GR). For this named vector, the element names are gene symbols or GR (in the format of 'chrN:start-end', where N is either 1-22 or X, start/end is genomic positional number; for example, 'chr1:13-20'), the element values for the significance level (measured as p-value or fdr). Alternatively, it can be a matrix or data frame with two columns: 1st column for gene symbols or GR, 2nd column for the significance level. Also supported is the input with GR only (without the significance level) |
entity |
the entity. It can be either "Gene" or "GR" |
significance.threshold |
the given significance threshold. By default, it is set to NULL, meaning there is no constraint on the significance level when transforming the significance level into scores. If given, those below this are considered significant and thus scored positively. Instead, those above this are considered insignificant and thus receive no score |
score.cap |
the maximum score being capped. By default, it is set to NULL, meaning that no capping is applied |
build.conversion |
the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so) |
crosslink |
the built-in crosslink info with a score quantifying
the link of a GR to a gene. See |
crosslink.customised |
the crosslink info with a score quantifying the link of a GR to a gene. A user-input matrix or data frame with 4 columns: 1st column for genomic regions (formatted as "chr:start-end", genome build 19), 2nd column for Genes, 3rd for crosslink score (crosslinking a genomic region to a gene, such as -log10 significance level), and 4th for contexts (optional; if nor provided, it will be added as 'C'). Alternatively, it can be a file containing these 4 columns. Required, otherwise it will return NULL |
cdf.function |
a character specifying how to transform the input crosslink score. It can be one of 'original' (no such transformation), and 'empirical' for looking at empirical Cumulative Distribution Function (cdf; as such it is converted into pvalue-like values [0,1]) |
scoring.scheme |
the method used to calculate seed gene scores under a set of GR (also over Contexts if many). It can be one of "sum" for adding up, "max" for the maximum, and "sequential" for the sequential weighting. The sequential weighting is done via: ∑_{i=1}{\frac{R_{i}}{i}}, where R_{i} is the i^{th} rank (in a descreasing order) |
nearby.distance.max |
the maximum distance between genes and GR. Only those genes no far way from this distance will be considered as seed genes. This parameter will influence the distance-component weights calculated for nearby GR per gene |
nearby.decay.kernel |
a character specifying a decay kernel function. It can be one of 'slow' for slow decay, 'linear' for linear decay, and 'rapid' for rapid decay. If no distance weight is used, please select 'constant' |
nearby.decay.exponent |
a numeric specifying a decay exponent. By default, it sets to 2 |
networks |
the built-in network. For direct (pathway-merged) interactions sourced from KEGG, it can be 'KEGG' for all, 'KEGG_metabolism' for pathways grouped into 'Metabolism', 'KEGG_genetic' for 'Genetic Information Processing' pathways, 'KEGG_environmental' for 'Environmental Information Processing' pathways, 'KEGG_cellular' for 'Cellular Processes' pathways, 'KEGG_organismal' for 'Organismal Systems' pathways, and 'KEGG_disease' for 'Human Diseases' pathways |
seed.genes |
logical to indicate whether the identified network is restricted to seed genes (ie input genes with the signficant level). By default, it sets to true |
subnet.significance |
the given significance threshold. By default, it is set to NULL, meaning there is no constraint on nodes/genes. If given, those nodes/genes with p-values below this are considered significant and thus scored positively. Instead, those p-values above this given significance threshold are considered insigificant and thus scored negatively |
subnet.size |
the desired number of nodes constrained to the resulting subnet. It is not nulll, a wide range of significance thresholds will be scanned to find the optimal significance threshold leading to the desired number of nodes in the resulting subnet. Notably, the given significance threshold will be overwritten by this option |
ontologies |
the ontologies supported currently. It can be 'AA' for AA-curated pathways, KEGG pathways (including 'KEGG' for all, 'KEGGmetabolism' for 'Metabolism' pathways, 'KEGGgenetic' for 'Genetic Information Processing' pathways, 'KEGGenvironmental' for 'Environmental Information Processing' pathways, 'KEGGcellular' for 'Cellular Processes' pathways, 'KEGGorganismal' for 'Organismal Systems' pathways, and 'KEGGdisease' for 'Human Diseases' pathways), 'REACTOME' for REACTOME pathways or 'REACTOME_x' for its sub-ontologies (where x can be 'CellCellCommunication', 'CellCycle', 'CellularResponsesToExternalStimuli', 'ChromatinOrganization', 'CircadianClock', 'DevelopmentalBiology', 'DigestionAndAbsorption', 'Disease', 'DNARepair', 'DNAReplication', 'ExtracellularMatrixOrganization', 'GeneExpression(Transcription)', 'Hemostasis', 'ImmuneSystem', 'Metabolism', 'MetabolismOfProteins', 'MetabolismOfRNA', 'Mitophagy', 'MuscleContraction', 'NeuronalSystem', 'OrganelleBiogenesisAndMaintenance', 'ProgrammedCellDeath', 'Reproduction', 'SignalTransduction', 'TransportOfSmallMolecules', 'VesicleMediatedTransport') |
size.range |
the minimum and maximum size of members of each term in consideration. By default, it sets to a minimum of 10 but no more than 2000 |
min.overlap |
the minimum number of overlaps. Only those terms with members that overlap with input data at least min.overlap (3 by default) will be processed |
fdr.cutoff |
fdr cutoff used to declare the significant terms. By default, it is set to 0.05 |
crosstalk.top |
the number of the top paths will be returned. By default, it is NULL meaning no such restrictions |
glayout |
either a function or a numeric matrix configuring how the vertices will be placed on the plot. If layout is a function, this function will be called with the graph as the single parameter to determine the actual coordinates. This function can be one of "layout_nicely" (previously "layout.auto"), "layout_randomly" (previously "layout.random"), "layout_in_circle" (previously "layout.circle"), "layout_on_sphere" (previously "layout.sphere"), "layout_with_fr" (previously "layout.fruchterman.reingold"), "layout_with_kk" (previously "layout.kamada.kawai"), "layout_as_tree" (previously "layout.reingold.tilford"), "layout_with_lgl" (previously "layout.lgl"), "layout_with_graphopt" (previously "layout.graphopt"), "layout_with_sugiyama" (previously "layout.kamada.kawai"), "layout_with_dh" (previously "layout.davidson.harel"), "layout_with_drl" (previously "layout.drl"), "layout_with_gem" (previously "layout.gem"), "layout_with_mds", and "layout_as_bipartite". A full explanation of these layouts can be found in http://igraph.org/r/doc/layout_nicely.html |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display |
RData.location |
the characters to tell the location of built-in
RData files. See |
an object of class "cPath", a list with following components:
ig_paths
: an object of class "igraph". It has graph
attribute (enrichment, and/or evidence, gp_evidence and membership if
entity is 'GR'), ndoe attributes (crosstalk)
gp_paths
: a 'ggplot' object for pathway crosstalk
visualisation
gp_heatmap
: a 'ggplot' object for pathway member gene
visualisation
ig_subg
: an object of class "igraph".
xDefineNet
, xCombineNet
,
xSubneterGenes
, xGR2xNet
,
xEnricherGenesAdv
, xGGnetwork
,
xHeatmap
## Not run: # Load the XGR package and specify the location of built-in data library(XGR) RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev/" # 1) at the gene level data(Haploid_regulators) ## only PD-L1 regulators and their significance info (FDR) data <- subset(Haploid_regulators, Phenotype=='PDL1')[,c('Gene','FDR')] ## pathway crosstalk cPath <- xCrosstalk(data, entity="Gene", network="KEGG", subnet.significance=0.05, subnet.size=NULL, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## visualisation pdf("xCrosstalk_Gene.pdf", width=7, height=8) gp_both <- gridExtra::grid.arrange(grobs=list(cPath$gp_paths,cPath$gp_heatmap), layout_matrix=cbind(c(1,1,1,1,2))) dev.off() # 2) at the genomic region (SNP) level data(ImmunoBase) ## all ImmunoBase GWAS SNPs and their significance info (p-values) ls_df <- lapply(ImmunoBase, function(x) as.data.frame(x$variant)) df <- do.call(rbind, ls_df) data <- unique(cbind(GR=paste0(df$seqnames,':',df$start,'-',df$end), Sig=df$Pvalue)) ## pathway crosstalk df_xGenes <- xGR2xGenes(data[as.numeric(data[,2])<5e-8,1], format="chr:start-end", crosslink="PCHiC_combined", scoring=T, RData.location=RData.location) mSeed <- xGR2xGeneScores(data, significance.threshold=5e-8, crosslink="PCHiC_combined", RData.location=RData.location) subg <- xGR2xNet(data, significance.threshold=5e-8, crosslink="PCHiC_combined", network="KEGG", subnet.significance=0.1, RData.location=RData.location) cPath <- xCrosstalk(data, entity="GR", significance.threshold=5e-8, crosslink="PCHiC_combined", networks="KEGG", subnet.significance=0.1, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## visualisation pdf("xCrosstalk_SNP.pdf", width=7, height=8) gp_both <- gridExtra::grid.arrange(grobs=list(cPath$gp_paths,cPath$gp_heatmap), layout_matrix=cbind(c(1,1,1,1,2))) dev.off() # 3) at the genomic region (without the significance info) level Age_CpG <- xRDataLoader(RData.customised='Age_CpG', RData.location=RData.location)[-1,1] CgProbes <- xRDataLoader(RData.customised='CgProbes', RData.location=RData.location) ind <- match(Age_CpG, names(CgProbes)) gr_CpG <- CgProbes[ind[!is.na(ind)]] df <- as.data.frame(gr_CpG) data <- paste0(df$seqnames,':',df$start,'-',df$end) ## pathway crosstalk df_xGenes <- xGR2xGenes(data, format="chr:start-end", crosslink="PCHiC_combined", scoring=T, RData.location=RData.location) subg <- xGR2xNet(data, crosslink="PCHiC_combined", network="KEGG", subnet.significance=0.1, RData.location=RData.location) cPath <- xCrosstalk(data, entity="GR", crosslink="PCHiC_combined", networks="KEGG", subnet.significance=0.1, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## End(Not run)