tSNE_df makes use of Rtsne::Rtsne, which is a wrapper for the C++ implementation of Barnes-Hut t-Distributed Stochastic Neighbor Embedding. tSNE is a method for constructing a low dimensional embedding of high-dimensional data, distances, or similarities. Exact t-SNE can be computed by setting theta = 0.0.

tSNE_df(
  data,
  dims = 2,
  initial_dims = 50,
  perplexity = 3,
  theta = 0.5,
  check_duplicates = TRUE,
  pca = TRUE,
  partial_pca = FALSE,
  max_iter = 1000,
  verbose = FALSE,
  is_distance = FALSE,
  Y_init = NULL,
  pca_center = TRUE,
  pca_scale = FALSE,
  normalize = TRUE,
  stop_lying_iter = ifelse(is.null(Y_init), 250L, 0L),
  mom_switch_iter = ifelse(is.null(Y_init), 250L, 0L),
  momentum = 0.5,
  final_momentum = 0.8,
  eta = 200,
  exaggeration_factor = 12,
  num_threads = 1
)

Arguments

data

A data frame object or matrix.

dims

integer; Output dimensionality (default: 2)

initial_dims

integer; the number of dimensions that should be retained in the initial PCA step (default: 50)

perplexity

numeric; Perplexity parameter (should not be bigger than 3 * perplexity < nrow(X) - 1, see details for interpretation)

theta

numeric; Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5)

check_duplicates

logical; Checks whether duplicates are present. It is best to make sure there are no duplicates present and set this option to FALSE, especially for large datasets (default: TRUE)

pca

logical; Whether an initial PCA step should be performed (default: TRUE)

partial_pca

logical; Whether truncated PCA should be used to calculate principal components (requires the irlba package). This is faster for large input matrices (default: FALSE)

max_iter

integer; Number of iterations (default: 1000)

verbose

logical; Whether progress updates should be printed (default: global "verbose" option, or FALSE if that is not set)

is_distance

logical; Indicate whether X is a distance matrix (default: FALSE)

Y_init

matrix; Initial locations of the objects. If NULL, random initialization will be used (default: NULL). Note that when using this, the initial stage with exaggerated perplexity values and a larger momentum term will be skipped.

pca_center

logical; Should data be centered before pca is applied? (default: TRUE)

pca_scale

logical; Should data be scaled before pca is applied? (default: FALSE)

normalize

logical; Should data be normalized internally prior to distance calculations with normalize_input? (default: TRUE)

stop_lying_iter

integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0)

mom_switch_iter

integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0)

momentum

numeric; Momentum used in the first part of the optimization (default: 0.5)

final_momentum

numeric; Momentum used in the final part of the optimization (default: 0.8)

eta

numeric; Learning rate (default: 200.0)

exaggeration_factor

numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0)

num_threads

integer; Number of threads to use when using OpenMP, default is 1. Setting to 0 corresponds to detecting and using all available cores

index

integer matrix; Each row contains the identity of the nearest neighbors for each observation

distance

numeric matrix; Each row contains the distance to the nearest neighbors in index for each observation

References

Krijthe, J. H. (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: https://github.com/jkrijthe/Rtsne

Author

D. Schmitz

Examples


tSNE_df(gdsm_df)
#>            tSNE1       tSNE2
#> var01  44.600084   61.214476
#> var02 -78.204693  -78.844628
#> var03 -71.093507 -112.740061
#> var04  40.288728   82.899071
#> var05 -14.204317   -9.523896
#> var06   5.604320   26.788401
#> var07 -73.706376  -98.416387
#> var08 -51.391025   67.204823
#> var09  14.438828   44.324426
#> var10  92.967784   15.703337
#> var11  33.937197   50.806443
#> var12  20.753213  -47.637960
#> var13 -52.567206   86.943908
#> var14   0.822775  -21.488638
#> var15  92.560289    1.265384
#> var16 -12.479836  -78.672555
#> var17  62.961459   52.551014
#> var18   3.953976  -37.333902
#> var19 -19.133514  -64.451974
#> var20 -40.108181   59.408720