Skip to content

lvdmaaten/bhtsne

Folders and files

NameName
Last commit message
Last commit date

Latest commit

cd619e6 · Apr 23, 2021

History

83 Commits
Feb 14, 2017
Mar 25, 2017
Feb 11, 2015
Mar 25, 2017
Mar 25, 2017
Jun 11, 2019
Jan 8, 2017
Feb 11, 2015
Feb 11, 2015
Apr 23, 2021
Feb 4, 2019
Feb 4, 2019
Feb 11, 2015

Repository files navigation

Build Status

This software package contains a Barnes-Hut implementation of the t-SNE algorithm. The implementation is described in this paper.

Installation

On Linux or OS X, compile the source using the following command:

g++ sptree.cpp tsne.cpp tsne_main.cpp -o bh_tsne -O2

The executable will be called bh_tsne.

On Windows using Visual C++, do the following in your command line:

  • Find the vcvars64.bat file in your Visual C++ installation directory. This file may be named vcvars64.bat or something similar. For example:
  // Visual Studio 12
  "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"

  // Visual Studio 2013 Express:
  C:\VisualStudioExp2013\VC\bin\x86_amd64\vcvarsx86_amd64.bat
  • From cmd.exe, go to the directory containing that .bat file and run it.

  • Go to bhtsne directory and run:

  nmake -f Makefile.win all

The executable will be called windows\bh_tsne.exe.

Usage

The code comes with wrappers for Matlab and Python. These wrappers write your data to a file called data.dat, run the bh_tsne binary, and read the result file result.dat that the binary produces. There are also external wrappers available for Torch, R, and Julia. Writing your own wrapper should be straightforward; please refer to one of the existing wrappers for the format of the data and result files.

Demonstration of usage in Matlab:

filename = websave('mnist_train.mat', 'https://github.com/awni/cs224n-pa4/blob/master/Simple_tSNE/mnist_train.mat?raw=true');
load(filename);
numDims = 2; pcaDims = 50; perplexity = 50; theta = .5; alg = 'svd';
map = fast_tsne(digits', numDims, pcaDims, perplexity, theta, alg);
gscatter(map(:,1), map(:,2), labels');

Demonstration of usage in Python:

import numpy as np
import bhtsne

data = np.loadtxt("mnist2500_X.txt", skiprows=1)

embedding_array = bhtsne.run_bh_tsne(data, initial_dims=data.shape[1])

Python Wrapper

Usage:

python bhtsne.py [-h] [-d NO_DIMS] [-p PERPLEXITY] [-t THETA]
                  [-r RANDSEED] [-n INITIAL_DIMS] [-v] [-i INPUT]
                  [-o OUTPUT] [--use_pca] [--no_pca] [-m MAX_ITER]

Below are the various options the wrapper program bhtsne.py expects:

  • -h, --help show this help message and exit
  • -d NO_DIMS, --no_dims NO_DIMS
  • -p PERPLEXITY, --perplexity PERPLEXITY
  • -t THETA, --theta THETA
  • -r RANDSEED, --randseed RANDSEED
  • -n INITIAL_DIMS, --initial_dims INITIAL_DIMS
  • -v, --verbose
  • -i INPUT, --input INPUT: the input file, expects a TSV with the first row as the header.
  • -o OUTPUT, --output OUTPUT: A TSV file having each row as the d dimensional embedding.
  • --use_pca
  • --no_pca
  • -m MAX_ITER, --max_iter MAX_ITER