Introduction

Phylotranscriptomics is a sub-field of Evolutionary Developmental Biology research investigating the evolutionary conservation of transcriptomes during development. The term Phylotranscriptomics defines the combination of evolutionary information with transcriptome data to capture signals of evolutionary conservation.

The Introduction to Phylotranscriptomics vignette provided by the myTAI package will give users a broad overwiew of phylotranscriptomics as scientific discipline.

The biomartr package aims to provide an interface for functional annotation queries. Hence, combining functional annotation with the detection of evolutionary signals in biological processes with orthologr and myTAI allows users to correlate biological functions and constraints with their evolutionary origin.

This vignette aims to provide use cases for combining functional annotation of genes (biomartr) with their role in development (myTAI) in an evolutionary context (orthologr).

Use Case #1: Functional Annotation of Genes Sharing a Common Evolutionary History

Phylotranscriptomics aims to predict stages or periods of evolutionary conservation in biological processes on the transcriptome level. However, finding genes sharing a common evolutionary history could reveal how the the biological process might have evolved in the first place.

In this Use Case we will combine functional and biological annotation obtained with biomartr with enriched genes obtained with PlotEnrichment().

Step 1

For the following example we will use the dataset an enrichment analyses found in PlotEnrichment().

Install and load the myTAI package:

# install myTAI
install.packages("myTAI")

# load myTAI
library(myTAI)

Download the Phylostratigraphic Map of D. rerio:

# download the Phylostratigraphic Map of Danio rerio
# from Sestak and Domazet-Loso, 2015
download.file( url      = "http://mbe.oxfordjournals.org/content/suppl/2014/11/17/msu319.DC1/TableS3-2.xlsx", 
               destfile = "MBE_2015a_Drerio_PhyloMap.xlsx" )

Read the *.xlsx file storing the Phylostratigraphic Map of D. rerio and format it for the use with myTAI:

# install the readxl package
install.packages("readxl")

# load package readxl
library(readxl)

# read the excel file
DrerioPhyloMap.MBEa <- read_excel("MBE_2015a_Drerio_PhyloMap.xlsx", sheet = 1, skip = 4)

# format Phylostratigraphic Map for use with myTAI
Drerio.PhyloMap <- DrerioPhyloMap.MBEa[ , 1:2]

# have a look at the final format
head(Drerio.PhyloMap)
  Phylostrata            ZFIN_ID
1           1 ZDB-GENE-000208-13
2           1 ZDB-GENE-000208-17
3           1 ZDB-GENE-000208-18
4           1 ZDB-GENE-000208-23
5           1  ZDB-GENE-000209-3
6           1  ZDB-GENE-000209-4

Now, Drerio.PhyloMap stores the Phylostratigraphic Map of D. rerio which is used as background set to perform enrichment analyses with PlotEnrichment() from myTAI.

Enrichment Analyses

Now, the PlotEnrichment() function visualizes the over- and underrepresented Phylostrata of brain specific genes when compared with the total number of genes stored in the Phylostratigraphic Map of D. rerio.

library(readxl)

# read expression data (organ specific genes) from Sestak and Domazet-Loso, 2015
Drerio.OrganSpecificExpression <- read_excel("MBE_2015a_Drerio_PhyloMap.xlsx", sheet = 2, skip = 3)

# select only brain specific genes
Drerio.Brain.Genes <- unique(na.omit(Drerio.OrganSpecificExpression[ , "brain"]))

# visualize enriched Phylostrata of genes annotated as brain specific
PlotEnrichment(Drerio.PhyloMap,
               test.set     = Drerio.Brain.Genes,
               measure      = "foldchange",
               use.only.map = TRUE,
               legendName   = "PS")

Users will observe that for example brain genes deriving from PS5 are significantly enriched.

Now we can select all brain genes originating in PS5 using the SelectGeneSet() function from myTAI. Please notice that SelectGeneSet() can only be used with phylostratigraphic maps only (use.map.only = TRUE argument) since myTAI version > 0.3.0.

BrainGenes <- SelectGeneSet(ExpressionSet = Drerio.PhyloMap,
                            gene.set      = Drerio.Brain.Genes,
                            use.only.map  = TRUE)

# select only brain genes originating in PS5
BrainGenes.PS5 <- BrainGenes[which(BrainGenes[ , "Phylostrata"] == 5), ]

# look at the results
head(BrainGenes.PS5)
      Phylostrata           ZFIN_ID
14851           5 ZDB-GENE-000210-6
14852           5 ZDB-GENE-000210-7
14853           5 ZDB-GENE-000328-4
14856           5 ZDB-GENE-000411-1
14857           5 ZDB-GENE-000427-4
14860           5 ZDB-GENE-000526-1

Now users can perform the biomart() function to obtain the functional annotation of brain genes originating in PS5.

For this purpose, first we need to find the filter name of the corresponding gene ids such as ZDB-GENE-000210-6.

# find filter for zfin.org ids
organismFilters("Danio rerio", topic = "zfin_id")
Source: local data frame [3 x 4]

                          name                           description    mart             dataset
1                 with_zfin_id                       with ZFIN ID(s) ensembl drerio_gene_ensembl
2 with_zfin_id_transcript_name          with ZFIN transcript name(s) ensembl drerio_gene_ensembl
3                      zfin_id ZFIN ID(s) [e.g. ZDB-GENE-060825-136] ensembl drerio_gene_ensembl

Now users can retrieve the corresponding GO attribute of D. rerio with organismAttributes.

# find go attribute term for D. rerio
organismAttributes("Danio rerio", topic = "go")
Source: local data frame [18 x 4]

                                            name                             description    mart
1  ggorilla_homolog_canomical_transcript_protein      Canonical Protein or Transcript ID ensembl
2                     ggorilla_homolog_chrom_end             Gorilla Chromosome End (bp) ensembl
3                   ggorilla_homolog_chrom_start           Gorilla Chromosome Start (bp) ensembl
4                    ggorilla_homolog_chromosome                 Gorilla Chromosome Name ensembl
5                            ggorilla_homolog_dn                                      dN ensembl
6                            ggorilla_homolog_ds                                      dS ensembl
7                  ggorilla_homolog_ensembl_gene                 Gorilla Ensembl Gene ID ensembl
8               ggorilla_homolog_ensembl_peptide              Gorilla Ensembl Protein ID ensembl
9          ggorilla_homolog_orthology_confidence    Orthology confidence [0 low, 1 high] ensembl
10               ggorilla_homolog_orthology_type                           Homology Type ensembl
11                      ggorilla_homolog_perc_id   % Identity with respect to query gene ensembl
12                   ggorilla_homolog_perc_id_r1 % Identity with respect to Gorilla gene ensembl
13                      ggorilla_homolog_subtype                                Ancestor ensembl
14                                         go_id                                   GO ID ensembl
15                               go_linkage_type                   GO Term Evidence Code ensembl
16                          goslim_goa_accession                 GOSlim GOA Accession(s) ensembl
17                        goslim_goa_description                  GOSlim GOA Description ensembl
18                                      quick_go                             Quick GO ID ensembl
Variables not shown: dataset (chr)

Now users can specify the filter zfin_id and attribute go_id to retrieve the GO terms of corresponding gene ids (Please note that this will take some time).

# retrieve GO terms of D. rerio brain genes originating in PS5
GO_tbl.BrainGenes <- biomart(genes      = BrainGenes.PS5[ , "ZFIN_ID"]),
                             mart       = "ensembl",
                             dataset    = "drerio_gene_ensembl",
                             attributes = "go_id",
                             filters    = "zfin_id")

head(GO_tbl.BrainGenes)