Dimensionality Reduction of a FluidDataSet Using Multidimensional Scaling
Multidimensional Scaling transforms a dataset to a lower number of dimensions while trying to preserve the distance relationships between the data points, so that even with fewer dimensions, the differences and similarities between points can still be observed and used effectively.
First, MDS computes a distance matrix by calculating the distance between every pair of points in the dataset. It then positions all the points in the lower number of dimensions (specified by numDimensions
) and iteratively shifts them around until the distances between all the points in the lower number of dimensions is as close as possible to the distances in the original dimensional space.
What makes this MDS implementation more flexible than some of the other dimensionality reduction algorithms in FluCoMa is that MDS allows for different measures of distance to be used (see list below).
Note that unlike the other dimensionality reduction algorithms, MDS does not have a fit
or transform
method, nor does it have the ability to transform data points in buffers. This is essentially because the algorithm needs to do the fit & transform as one with just the data provided in the source DataSet and therefore incorporating new data points would require a re-fitting of the model.
Manhattan Distance: The sum of the absolute value difference between points in each dimension. This is also called the Taxicab Metric. https://en.wikipedia.org/wiki/Taxicab_geometry
Euclidean Distance: Square root of the sum of the squared differences between points in each dimension (Pythagorean Theorem) https://en.wikipedia.org/wiki/Euclidean_distance This metric is the default, as it is the most commonly used.
Squared Euclidean Distance: Square the Euclidean Distance between points. This distance measure more strongly penalises larger distances, making them seem more distant, which may reveal more clustered points. https://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
Minkowski Max Distance: The distance between two points is reported as the largest difference between those two points in any one dimension. Also called the Chebyshev Distance or the Chessboard Distance. https://en.wikipedia.org/wiki/Chebyshev_distance
Minkowski Min Distance: The distance between two points is reported as the smallest difference between those two points in any one dimension.
Symmetric Kullback Leibler Divergence: Because the first part of this computation uses the logarithm of the values, using the Symmetric Kullback Leibler Divergence only makes sense with non-negative data. https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Symmetrised_divergence
Read more about FluidMDS on the learn platform.
server |
The Server on which to construct this object | ||||||||||||
numDimensions |
The number of dimensions to reduce to Constraints
| ||||||||||||
distanceMetric |
The distance metric to use (integer 0-5)
|
Property for numDimensions
. See new
Property for distanceMetric
. See new
Fit the model to a FluidDataSet and write the new projected data to a destination DataSet.
sourceDataSet |
Source DataSet |
destDataSet |
Destination DataSet |
action |
A function to execute when the server has completed running fitTransform |
Comparing Distance Measures
Just looking at these plots won't really reveal the differences between these distance measures--the best way to see which might be best is to test them on your own data and listen to the musical differences they create!