FastMap Dimension Reduction in the Browser
FastMap is a dimensionality reduction algorithm; a faster alternative to MDS and t-SNE. You give it the pairwise distances between all your N-dimensional points and it gives you K-dimensional (say, 2D) coordinates for each of your data points. For the mathematically inclined ones, FastMap is a ℛN⟼ℛK mapping where usually, but not necessarily, {2,3}∍K≪N holds.
Tutorial
Let's look at an easy example first. I generated a bunch of 2D-points with random positions and plotted them for your viewing pleasure. The number on a point is its unique ID, the only use of which is to make it easy to identify individual points. Hover one point with your cursor and you'll see two lines connecting it with the previous and the next point by ID.
points = ([Math.random(), Math.random()] for _ in [0..29])
Computing the pairwise euclidean distances between all of these points results in the following matrix. (Notice the zero distances on the diagonal.)
# Euclidean distance between two n-dimensional points.
dist = (p1, p2) -> Math.sqrt d3.sum d3.zip(p1, p2).map ([p1x, p2x]) -> (d=p1x-p2x)*d
# A distance matrix. It's symmetric.
# Note that that generator generates a list of lists, unlike python's would!
dists = (dist p1, p2 for p1 in points for p2 in points)
# For larger datasets, creating a native array of floats will be more efficient.
darr = new Float32Array new ArrayBuffer 4*(l=points.length)*l
darr[i*l+j] = darr[j*l+i] = dist points[i], points[j] for i in [0..j] for j in [0..l-1]
Using only this distance matrix, FastMap can compute K-dimensional coordinates
for each point such that the distances are more-or-less preserved.
The number of dimensions K is specified through the optional argument ndims
but defaults to two if unspecified.
# Returns 2D coordinates given the array-of-arrays distance matrix.
# The `ndims: 2` can and will be omitted in the following examples.
coords = fastmap dists, ndims: 2
# Returns 2D coordinates given the native-float-array distance matrix.
coords = fastmap darr
# Returns 2D coordinates given the distance function and number of points.
# The function is called with the IDs of two points whose distance it should return.
coords = fastmap ((i,j) -> dist points[i], points[j]), npoints: points.length
The next figure shows the coordinates FastMap gave to our points in yellow. You can now explore this result by hovering the various points and see that both the distances and the angles are more-or-less preserved between the original and the FastMap points.
You might have noticed that while the distances and angles seem to match, the actual positions of the points are all screwed up! This is because, given only the distances between points, for any layout of the points, there are infinitely many rotated and mirrored versions of that layout which result in exactly the same distances. While there are ways to rotate all pointclouds to some "canonical" orientation, we don't need to worry about that for most use cases.
TL;DR of the Paper
For each dimension independently, FastMap somehow (i.e. heuristically) chooses two very distant points, call them pivots, and projects all other points onto the line spanned by those pivots. The projection is done in a certain way which roughly preserves angles and distances. This is done recursively for each dimension, and the distance measure is adapted in each recursion step.
The default way in which FastMap chooses the pivots is the following:
P = random point
pivotA = point which is furthest away from P
pivotB = point which is furthest away from pivotA
Pivot Constancy (Animations)
While the aforementioned way of picking pivots is both fast and good in most cases, it is problematic for animated datasets. If your points (and thus their distances) change over time, different pivots will potentially be chosen in each frame, resulting in a projection I like to call extasic.
This problem can easily be solved by reusing the same pivots throughout the
animation. FastMap accepts an optional pivots
argument which should be a
function(dist, nobjs, dim) -> [id1, id2]
. The dist
argument is a
function(i,j) -> d
which you can call in order to get the current distance
between i
and j
. (Which is not necessarily the same as the corresponding
entry of the distance-matrix you gave to FastMap; remember that FastMap
adapts the distance measure in each recursion step!) nobjs
is simply the total
number of datapoints and dim
is the dimension FastMap is currently working on.
The simplest thing to do is then to always return some constant pivot IDs, say the first and the last point.
coords = fastmap dists, pivots: (_, nobjs) -> [0, nobjs-1]
While the animation is not extasic anymore, it is not good either; all points are projected onto one line! What could be the cause of this?
Yes, indeed, I told you that the pivots
function is called once per dimension
and FastMap needs two pivots per dimension. Our function above returns the
exact same pivots for every dimension, thus all dimensions are projected onto
the same line, hence the observed result.
One probably trivial, certainly suboptimal, solution to this problem is:
coords = fastmap dists, pivots: (_, nobjs, dim) -> [dim, nobjs-1-dim]
Better yet but still not satisfying, since these pivots highly depends on the
order of your datapoints as opposed to their distances.
The way in which FastMap picks its pivots when no pivots
function is given is
quite good, so ideally we'd pick our pivots in the same way and then stick with
them throughout the whole animation.
In addition to the coordinates of the points, the fastmap
function can be told
to also return an array of the pivots it chose for each dimension.
This is done by setting the optional argument retpivots
to true.
[oldcoords, oldpivots] = fastmap dists, retpivots: true
Instead of a function, this array of pivots can now be passed as pivots
argument to subsequent calls of fastmap
.
newcoords = fastmap newdists, pivots: oldpivots
Purrrfect. Compare this result with the first example; I have used the pivots from that computation, thus the projection is the same.
More Examples
Used pivots: .
TODO: All datasets from the original paper, with a dropdown box. Maybe also those from the «Laplacian Eigenmaps for Dimensionality Reduction and Data Representation?»