1 Introduction

Dimensionality Reduction methods are either manifold learning approaches or methods of projection. Projection methods should be prefered if the goal is the visualization of cluster structures [Thrun, 2018]. Two-dimensional projections are visualized as scatter plot. The Johnson–Lindenstrauss lemma states that in such a case the low-dimensional similarities does not represent high-dimensional distances coercively (details in [Thrun/Ultsch,2018]). To solve this problem the high-dimensional distances can be visualized in the two-dimensional projection as 3D landscape of a topographic map with hypsometric tints[Thrun, 2018; Ultsch/Thrun, 2017; Thrun et al., 2016].

2 Generate a Two-Dimensional Scatter Plot of High-Dimensional Data

First generate a 2d projection, the DistanceMatrix has to be defined by the Please see the ProjectionBasedClustering package on CRAN for other common projection methods.

data("Chainlink")
Data=Chainlink$Data
Cls=Chainlink$Cls
require(DataVisualizations)
## Loading required package: DataVisualizations
DataVisualizations::plot3D(Data,Cls,main='Chainlink dataset')

InputDistances=as.matrix(dist(Data))
res=cmdscale(d=InputDistances, k = 2, eig = TRUE, add = FALSE, x.ret = FALSE)
ProjectedPoints=as.matrix(res$points)
plot(ProjectedPoints,col=Cls)

You must enable Javascript to view this page properly.

3 Compute the Generalized Umatrix and Visualize the Topographic Map

Here the Generalized Umatrix is calculated using a simplified emergent self-organizing map algorithm. Then, the visualization of Generalized Umatrix is done by a 3D landscape called topographic map with hypsometric tints. The resulting visualization will be toroidal meaning that the left borders cyclically connects to the right border (and bottom to top). It means there are no “real” borders in this visualizations. Instead, the visualization is “continuous”. This can be visualized using the ‘Tiled=TRUE’ option of ‘plotTopographicMap’.

Note, that the ‘NoLevels’ option is only set to load this vignette faster and should normally not be set manually. It describes the number contour lines placed relative to the hypsometric tints. All visualizations here are small and a low dpi is set in knitr in order to load the vignette faster.

visualization=GeneralizedUmatrix(Data,ProjectedPoints)
plotTopographicMap(visualization$Umatrix,visualization$Bestmatches,NoLevels=10)

You must enable Javascript to view this page properly.

You must enable Javascript to view this page properly.

3.1 Saving Output

You can save either the output as a STL for 3D printing (see [Thrun et al., 2016]) or as a picture:

# To save as STL for 3D printing
 rgl::writeSTL("GenerelizedUmatrix_3d_model.stl")

# Save the visualization as a picture with
rgl::rgl.snapshot('test.png')

4 Generating the Shape of an Island out of the Topographic Map

To generate the 3D landscape in the shape of an island from the toroidal topographic map visualization you may cut your island interactively around high mountain ranges. Currently, I am unable to show the output in R markdown :-( If you know how to resolve the Rmarkdown issue, please mail me: info@deepbionics.org

library(ProjectionBasedClustering)
library(GeneralizedUmatrix)

Imx = ProjectionBasedClustering::interactiveGeneralizedUmatrixIsland(visualization$Umatrix,visualization$Bestmatches,Cls)

plotTopographicMap(visualization$Umatrix,visualization$Bestmatches, Cls=Cls,Imx = Imx)

4.1 Manually Clustering the Projection Using the Topographic Map

In this example, the four outliers can be marked manually with mouse clicks using the shiny interface. Currently, I am unable to show the output in R markdown :-( Please try it out yourself:

library(ProjectionBasedClustering)
Cls2=ProjectionBasedClustering::interactiveClustering(visualization$Umatrix, visualization$Bestmatches, Cls)

5 Quality Measures of Projection

5.1 Delaunay Classification Error

Using insights of graph theory, the Delaunay classification error calculates for each projected point the direct neighborhood based on the Delaunay graph. Every direct Connection is weighted with the high-dimensional distance between the two corresponding data points and sorted per neighborhood by these weights. In the next step all sorted projected points points of the direct neighborhood of each projected points aquire new weights according to the harmonic function. Then, the prior classification is used to check which points do not belong to these direct neighborhoods of projected points. The weights of these points are summed up. A lower value indicates a better two-dimensional projection of the high-dimensional Input space. A higher value indicates a worse two-dimensional projection of the high-dimensional Input space.

library(DatabionicSwarm)
DelaunayClassificationError(Chainlink$Data,projection$ProjectedPoints,Chainlink$Cls)

You can also compare various projections method to a common baseline together:

library(DatabionicSwarm)
DCEpswarm=DelaunayClassificationError(Chainlink$Data,projection$ProjectedPoints,Chainlink$Cls)$DCE
baselineproj=ProjectionBasedClustering::NeRV(Chainlink$Data)
DCEpswarm=DelaunayClassificationError(Chainlink$Data,baselineproj,Chainlink$Cls)$DCE
RelativeDifference(DCEpswarm,baselineproj)

This has the advantage of an clear range of \([-2,2]\). Further Details can be read in the conference presentation attached to [Thrun/Ultsch,2018] on ResearchGate.

6 References

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.

[Thrun/Ultsch,2018] Thrun, M. C., & Ultsch, A. : Investigating Quality measurements of projections for the Evaluation of Distance and Density-based Structures of High-Dimensional Data, Proc. European Conference on Data Analysis (ECDA), Paderborn, Germany. 2018a.

[Ultsch/Thrun, 2017] Ultsch, A., & Thrun, M. C.: Credible Visualizations for Planar Projections, in Cottrell, M. (Ed.), 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), IEEE Xplore, France, 2017.

[Thrun et al., 2016] Thrun, M. C., Lerch, F., Loetsch, J., & Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers, in Skala, V. (Ed.), International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), Vol. 24, Plzen, http://wscg.zcu.cz/wscg2016/short/A43-full.pdf, 2016.