Author: Tal Galili ( Tal.Galili@gmail.com )
A heatmap is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by dendrograms. Heatmaps are used in many fields for visualizing observations, correlations, missing values patterns, and more.
Interactive heatmaps allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the heatmap by draging a rectangle around the relevant area.
This work is based on the ggplot2 and plotly.js engine. It produces similar heatmaps as d3heatmap, with the advantage of speed (plotly.js is able to handle larger size matrix), and the ability to zoom from the dendrogram.
To install the stable version on CRAN:
install.packages('heatmaply')
To install the GitHub version:
# You'll need devtools
install.packages.2 <- function (pkg) if (!require(pkg)) install.packages(pkg);
install.packages.2('devtools')
# make sure you have Rtools installed first! if not, then run:
#install.packages('installr'); install.Rtools()
devtools::install_github("ropensci/plotly")
devtools::install_github('talgalili/heatmaply')
And then you may load the package using:
library("heatmaply")
library(heatmaply)
heatmaply(mtcars)
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
Because the labels are somewhat long, we need to manually fix the margins (hopefully this will be fixed in future versions of plot.ly)
heatmaply(mtcars) %>% layout(margin = list(l = 130, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
We can use this with correlation. Notice the use of limits to set the range of the colors, and how we color the branches:
heatmaply(cor(mtcars),
k_col = 2, k_row = 2,
limits = c(-1,1)) %>%
layout(margin = list(l = 40, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
heatmaply uses the seriation
package to find optimal ordering of rows and columns. Optimal means to optimze the Hamiltonian path length that is restricted by the dendrogram structure. Which, in other words, means to rotate the branches so that the sum of distances between each adjacent leaf (label) will be minimized. This is related to a restricted version of the travel salesman problem. The default options is “OLO” (Optimal leaf ordering) which optimizes the above mention critirion (it works in O(n^4)). Another option is “GW” (Gruvaeus and Wainer) which aims for the same goal but uses a (faster?) heuristic. The option “mean” gives the output we would get by default from heatmap functions in other packages such as gplots::heatmap.2
. The option “none” gives us the dendrograms without any rotation.
# The default of heatmaply:
heatmaply(mtcars[1:10,], seriate = "OLO") %>% layout(margin = list(l = 130, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
# Similar to OLO but less optimal (since it is a heuristic)
heatmaply(mtcars[1:10,], seriate = "GW") %>% layout(margin = list(l = 130, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
# the default by gplots::heatmaply.2
heatmaply(mtcars[1:10,], seriate = "mean") %>% layout(margin = list(l = 130, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
# the default output from hclust
heatmaply(mtcars[1:10,], seriate = "none") %>% layout(margin = list(l = 130, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
This works heavily relies on the seriation package (their vignette is well worth the read), and also lightly on the dendextend package (see vignette)
We can use different colors than the default viridis
. For example, we may want to use other color pallates in order to get divergent colors for the correlations (these will sadly be less friendly for color blind people):
# divergent_viridis_magma <- c(rev(viridis(100, begin = 0.3)), magma(100, begin = 0.3))
# rwb <- colorRampPalette(colors = c("darkred", "white", "darkgreen"))
library(RColorBrewer)
# display.brewer.pal(11, "BrBG")
BrBG <- colorRampPalette(brewer.pal(11, "BrBG"))
Spectral <- colorRampPalette(brewer.pal(11, "Spectral"))
heatmaply(cor(mtcars),
k_col = 2, k_row = 2,
colors = BrBG(256),
limits = c(-1,1)) %>%
layout(margin = list(l = 40, b = 40))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
Another example for using colors:
heatmaply(mtcars, colors = heat.colors(100))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
Or even more customized colors using scale_fill_gradient_fun
:
heatmaply(mtcars,
scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(low = "blue", high = "red", midpoint = 200, limits = c(0, 500)))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
Reviewing missing values:
library(heatmaply)
class_to <- function(x, new_class) {
class(x) <- new_class
x
}
na_mat <- function(x) {
x %>% is.na %>% class_to("numeric")
}
airquality %>% na_mat %>%
heatmaply(color = c("white","black"), grid_color = "grey",
k_col =3, k_row = 3) %>%
layout(margin = list(l = 40, b = 50))
#> Warning: No trace type specified and no positional attributes specified
#> No trace type specified. Applying `add_markers()`.
#> Read more about this trace type here -> https://plot.ly/r/reference/#scatter
This package is thanks to the amazing work done by MANY people in the open source community. Beyond the many people working on the pipeline of R, thanks should go to the plotly team, and especially to Carson Sievert and others working on the R package of plotly. Also, many of the design elements were inspired by the work done on heatmap, heatmap.2 and d3heatmap, so special thanks goes to the R core team, Gregory R. Warnes, and Joe Cheng from RStudio. The dendrogram side of the package is based on the work in dendextend, in which special thanks should go to Andrie de Vries for his original work on bringing dendrograms to ggplot2 (which evolved into the richer ggdend objects, as implemented in dendextend).
You are welcome to:
You can see the most recent changes to the package in the NEWS.md file
sessionInfo()
#> R version 3.3.0 (2016-05-03)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=Hebrew_Israel.1255
#> [3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C
#> [5] LC_TIME=Hebrew_Israel.1255
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RColorBrewer_1.1-2 knitr_1.13 heatmaply_0.5.0
#> [4] viridis_0.3.4 plotly_4.0.0 ggplot2_2.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtools_3.5.0 modeltools_0.2-21 reshape2_1.4.1
#> [4] kernlab_0.9-24 lattice_0.20-33 colorspace_1.2-6
#> [7] htmltools_0.3.5 stats4_3.3.0 viridisLite_0.1.3
#> [10] yaml_2.1.13 base64enc_0.1-3 DBI_0.4-1
#> [13] prabclus_2.2-6 registry_0.3 fpc_2.1-10
#> [16] foreach_1.4.3 plyr_1.8.4 robustbase_0.92-5
#> [19] stringr_1.0.0 munsell_0.4.3 gtable_0.2.0
#> [22] caTools_1.17.1 htmlwidgets_0.6 mvtnorm_1.0-5
#> [25] codetools_0.2-14 evaluate_0.9 labeling_0.3
#> [28] seriation_1.2-0 flexmix_2.3-13 class_7.3-14
#> [31] DEoptimR_1.0-4 trimcluster_0.1-2 Rcpp_0.12.5
#> [34] KernSmooth_2.23-15 scales_0.4.0 diptest_0.75-7
#> [37] formatR_1.4 gdata_2.17.0 jsonlite_1.0
#> [40] gplots_3.0.1 gridExtra_2.2.1 digest_0.6.9
#> [43] stringi_1.1.1 gclus_1.3.1 dplyr_0.5.0
#> [46] grid_3.3.0 bitops_1.0-6 tools_3.3.0
#> [49] magrittr_1.5 lazyeval_0.2.0 tibble_1.1
#> [52] cluster_2.0.4 whisker_0.3-2 tidyr_0.5.1
#> [55] dendextend_1.3.0 MASS_7.3-45 assertthat_0.1
#> [58] rmarkdown_0.9.6 httr_1.2.1 iterators_1.0.8
#> [61] R6_2.1.2 TSP_1.1-4 mclust_5.2
#> [64] nnet_7.3-12