FastMap Dimension Reduction in the Browser

You disabled JavaScript. Please enable it for syntax-highlighting, or don't complain about unlegible code snippets =) This page doesn't contain any tracking/analytics/ad code.

FastMap Dimension Reduction in the Browser

FastMap is a dimensionality reduction algorithm; a faster alternative to MDS and t-SNE. You give it the pairwise distances between all your N-dimensional points and it gives you K-dimensional (say, 2D) coordinates for each of your data points. For the mathematically inclined ones, FastMap is a ℛ^N⟼ℛ^K mapping where usually, but not necessarily, {2,3}∍K≪N holds.

Tutorial

Let's look at an easy example first. I generated a bunch of 2D-points with random positions and plotted them for your viewing pleasure. The number on a point is its unique ID, the only use of which is to make it easy to identify individual points. Hover one point with your cursor and you'll see two lines connecting it with the previous and the next point by ID.

points = ([Math.random(), Math.random()] for _ in [0..29])

Computing the pairwise euclidean distances between all of these points results in the following matrix. (Notice the zero distances on the diagonal.)

# Euclidean distance between two n-dimensional points.
dist = (p1, p2) -> Math.sqrt d3.sum d3.zip(p1, p2).map ([p1x, p2x]) -> (d=p1x-p2x)*d

# A distance matrix. It's symmetric.
# Note that that generator generates a list of lists, unlike python's would!
dists = (dist p1, p2 for p1 in points for p2 in points)

# For larger datasets, creating a native array of floats will be more efficient.
darr = new Float32Array new ArrayBuffer 4*(l=points.length)*l
darr[i*l+j] = darr[j*l+i] = dist points[i], points[j] for i in [0..j] for j in [0..l-1]

Using only this distance matrix, FastMap can compute K-dimensional coordinates for each point such that the distances are more-or-less preserved. The number of dimensions K is specified through the optional argument ndims but defaults to two if unspecified.

# Returns 2D coordinates given the array-of-arrays distance matrix.
# The `ndims: 2` can and will be omitted in the following examples.
coords = fastmap dists, ndims: 2

# Returns 2D coordinates given the native-float-array distance matrix.
coords = fastmap darr

# Returns 2D coordinates given the distance function and number of points.
# The function is called with the IDs of two points whose distance it should return.
coords = fastmap ((i,j) -> dist points[i], points[j]), npoints: points.length

The next figure shows the coordinates FastMap gave to our points in yellow. You can now explore this result by hovering the various points and see that both the distances and the angles are more-or-less preserved between the original and the FastMap points.

You might have noticed that while the distances and angles seem to match, the actual positions of the points are all screwed up! This is because, given only the distances between points, for any layout of the points, there are infinitely many rotated and mirrored versions of that layout which result in exactly the same distances. While there are ways to rotate all pointclouds to some "canonical" orientation, we don't need to worry about that for most use cases.

TL;DR of the Paper

For each dimension independently, FastMap somehow (i.e. heuristically) chooses two very distant points, call them pivots, and projects all other points onto the line spanned by those pivots. The projection is done in a certain way which roughly preserves angles and distances. This is done recursively for each dimension, and the distance measure is adapted in each recursion step.

The default way in which FastMap chooses the pivots is the following:

P = random point
pivotA = point which is furthest away from P
pivotB = point which is furthest away from pivotA

Pivot Constancy (Animations)

While the aforementioned way of picking pivots is both fast and good in most cases, it is problematic for animated datasets. If your points (and thus their distances) change over time, different pivots will potentially be chosen in each frame, resulting in a projection I like to call extasic.

This problem can easily be solved by reusing the same pivots throughout the animation. FastMap accepts an optional pivots argument which should be a function(dist, nobjs, dim) -> [id1, id2]. The dist argument is a function(i,j) -> d which you can call in order to get the current distance between i and j. (Which is not necessarily the same as the corresponding entry of the distance-matrix you gave to FastMap; remember that FastMap adapts the distance measure in each recursion step!) nobjs is simply the total number of datapoints and dim is the dimension FastMap is currently working on.

The simplest thing to do is then to always return some constant pivot IDs, say the first and the last point.

coords = fastmap dists, pivots: (_, nobjs) -> [0, nobjs-1]

While the animation is not extasic anymore, it is not good either; all points are projected onto one line! What could be the cause of this?

Yes, indeed, I told you that the pivots function is called once per dimension and FastMap needs two pivots per dimension. Our function above returns the exact same pivots for every dimension, thus all dimensions are projected onto the same line, hence the observed result. One probably trivial, certainly suboptimal, solution to this problem is:

coords = fastmap dists, pivots: (_, nobjs, dim) -> [dim, nobjs-1-dim]

Better yet but still not satisfying, since these pivots highly depends on the order of your datapoints as opposed to their distances. The way in which FastMap picks its pivots when no pivots function is given is quite good, so ideally we'd pick our pivots in the same way and then stick with them throughout the whole animation.

In addition to the coordinates of the points, the fastmap function can be told to also return an array of the pivots it chose for each dimension. This is done by setting the optional argument retpivots to true.

[oldcoords, oldpivots] = fastmap dists, retpivots: true

Instead of a function, this array of pivots can now be passed as pivots argument to subsequent calls of fastmap.

newcoords = fastmap newdists, pivots: oldpivots

Purrrfect. Compare this result with the first example; I have used the pivots from that computation, thus the projection is the same.

More Examples

Used pivots: .

TODO: All datasets from the original paper, with a dropdown box. Maybe also those from the «Laplacian Eigenmaps for Dimensionality Reduction and Data Representation?»