biomartr
Phylotranscriptomics
is a sub-field of Evolutionary Developmental Biology research investigating the evolutionary conservation of transcriptomes during development. The term Phylotranscriptomics
defines the combination of evolutionary information with transcriptome data to capture signals of evolutionary conservation.
The Introduction to Phylotranscriptomics vignette provided by the myTAI package will give users a broad overwiew of phylotranscriptomics as scientific discipline.
The biomartr
package aims to provide an interface for functional annotation queries. Hence, combining functional annotation with the detection of evolutionary signals in biological processes with orthologr and myTAI allows users to correlate biological functions and constraints with their evolutionary origin.
This vignette aims to provide use cases for combining functional annotation of genes (biomartr
) with their role in development (myTAI
) in an evolutionary context (orthologr
).
Phylotranscriptomics aims to predict stages or periods of evolutionary conservation in biological processes on the transcriptome level. However, finding genes sharing a common evolutionary history could reveal how the the biological process might have evolved in the first place.
In this Use Case
we will combine functional and biological annotation obtained with biomartr
with enriched genes obtained with PlotEnrichment().
For the following example we will use the dataset an enrichment analyses found in PlotEnrichment().
Install and load the myTAI package:
# install myTAI
install.packages("myTAI")
# load myTAI
library(myTAI)
Download the Phylostratigraphic Map
of D. rerio:
# download the Phylostratigraphic Map of Danio rerio
# from Sestak and Domazet-Loso, 2015
download.file( url = "http://mbe.oxfordjournals.org/content/suppl/2014/11/17/msu319.DC1/TableS3-2.xlsx",
destfile = "MBE_2015a_Drerio_PhyloMap.xlsx" )
Read the *.xlsx
file storing the Phylostratigraphic Map
of D. rerio and format it for the use with myTAI
:
# install the readxl package
install.packages("readxl")
# load package readxl
library(readxl)
# read the excel file
DrerioPhyloMap.MBEa <- read_excel("MBE_2015a_Drerio_PhyloMap.xlsx", sheet = 1, skip = 4)
# format Phylostratigraphic Map for use with myTAI
Drerio.PhyloMap <- DrerioPhyloMap.MBEa[ , 1:2]
# have a look at the final format
head(Drerio.PhyloMap)
Phylostrata ZFIN_ID
1 1 ZDB-GENE-000208-13
2 1 ZDB-GENE-000208-17
3 1 ZDB-GENE-000208-18
4 1 ZDB-GENE-000208-23
5 1 ZDB-GENE-000209-3
6 1 ZDB-GENE-000209-4
Now, Drerio.PhyloMap
stores the Phylostratigraphic Map
of D. rerio which is used as background set to perform enrichment analyses with PlotEnrichment()
from myTAI
.
Now, the PlotEnrichment()
function visualizes the over- and underrepresented Phylostrata
of brain specific genes when compared with the total number of genes stored in the Phylostratigraphic Map
of D. rerio.
library(readxl)
# read expression data (organ specific genes) from Sestak and Domazet-Loso, 2015
Drerio.OrganSpecificExpression <- read_excel("MBE_2015a_Drerio_PhyloMap.xlsx", sheet = 2, skip = 3)
# select only brain specific genes
Drerio.Brain.Genes <- unique(na.omit(Drerio.OrganSpecificExpression[ , "brain"]))
# visualize enriched Phylostrata of genes annotated as brain specific
PlotEnrichment(Drerio.PhyloMap,
test.set = Drerio.Brain.Genes,
measure = "foldchange",
use.only.map = TRUE,
legendName = "PS")
Users will observe that for example brain genes deriving from PS5 are significantly enriched.
Now we can select all brain genes originating in PS5 using the SelectGeneSet()
function from myTAI
. Please notice that SelectGeneSet()
can only be used with phylostratigraphic maps only (use.map.only = TRUE
argument) since myTAI version > 0.3.0.
BrainGenes <- SelectGeneSet(ExpressionSet = Drerio.PhyloMap,
gene.set = Drerio.Brain.Genes,
use.only.map = TRUE)
# select only brain genes originating in PS5
BrainGenes.PS5 <- BrainGenes[which(BrainGenes[ , "Phylostrata"] == 5), ]
# look at the results
head(BrainGenes.PS5)
Phylostrata ZFIN_ID
14851 5 ZDB-GENE-000210-6
14852 5 ZDB-GENE-000210-7
14853 5 ZDB-GENE-000328-4
14856 5 ZDB-GENE-000411-1
14857 5 ZDB-GENE-000427-4
14860 5 ZDB-GENE-000526-1
Now users can perform the biomart()
function to obtain the functional annotation of brain genes originating in PS5.
For this purpose, first we need to find the filter name of the corresponding gene ids such as ZDB-GENE-000210-6
.
# find filter for zfin.org ids
organismFilters("Danio rerio", topic = "zfin_id")
Source: local data frame [3 x 4]
name description mart dataset
1 with_zfin_id with ZFIN ID(s) ensembl drerio_gene_ensembl
2 with_zfin_id_transcript_name with ZFIN transcript name(s) ensembl drerio_gene_ensembl
3 zfin_id ZFIN ID(s) [e.g. ZDB-GENE-060825-136] ensembl drerio_gene_ensembl
Now users can retrieve the corresponding GO attribute of D. rerio with organismAttributes
.
# find go attribute term for D. rerio
organismAttributes("Danio rerio", topic = "go")
Source: local data frame [18 x 4]
name description mart
1 ggorilla_homolog_canomical_transcript_protein Canonical Protein or Transcript ID ensembl
2 ggorilla_homolog_chrom_end Gorilla Chromosome End (bp) ensembl
3 ggorilla_homolog_chrom_start Gorilla Chromosome Start (bp) ensembl
4 ggorilla_homolog_chromosome Gorilla Chromosome Name ensembl
5 ggorilla_homolog_dn dN ensembl
6 ggorilla_homolog_ds dS ensembl
7 ggorilla_homolog_ensembl_gene Gorilla Ensembl Gene ID ensembl
8 ggorilla_homolog_ensembl_peptide Gorilla Ensembl Protein ID ensembl
9 ggorilla_homolog_orthology_confidence Orthology confidence [0 low, 1 high] ensembl
10 ggorilla_homolog_orthology_type Homology Type ensembl
11 ggorilla_homolog_perc_id % Identity with respect to query gene ensembl
12 ggorilla_homolog_perc_id_r1 % Identity with respect to Gorilla gene ensembl
13 ggorilla_homolog_subtype Ancestor ensembl
14 go_id GO ID ensembl
15 go_linkage_type GO Term Evidence Code ensembl
16 goslim_goa_accession GOSlim GOA Accession(s) ensembl
17 goslim_goa_description GOSlim GOA Description ensembl
18 quick_go Quick GO ID ensembl
Variables not shown: dataset (chr)
Now users can specify the filter zfin_id
and attribute go_id
to retrieve the GO terms of corresponding gene ids (Please note that this will take some time).
# retrieve GO terms of D. rerio brain genes originating in PS5
GO_tbl.BrainGenes <- biomart(genes = BrainGenes.PS5[ , "ZFIN_ID"]),
mart = "ensembl",
dataset = "drerio_gene_ensembl",
attributes = "go_id",
filters = "zfin_id")
head(GO_tbl.BrainGenes)