"UMAP" (Machine Learning Method)
- Method for DimensionReduction, DimensionReduce, FeatureSpacePlot and FeatureSpacePlot3D.
- Reduce the dimension of data using uniform manifold approximation and projection.
Details & Suboptions
- "UMAP", which stands for uniform manifold approximation and projection, is a nonlinear nonparametric dimensionality reduction method. The method attempts to learn a low-dimensional representation of the data that preserves the local structure of the data in balance with the global structure.
- "UMAP" works for datasets with nonlinear manifolds and is particularly suited for the visualization of high-dimensional datasets.
- The following shows two-dimensional embeddings learned by the "UMAP" method applied to the benchmark datasets Fisher's Irises, MNIST and FashionMNIST:
- UMAP constructs a high-dimensional graph representation of the data then optimizes a low-dimensional graph to be as structurally similar as possible.
- In order to construct the initial high-dimensional graph, UMAP builds a weighted graph, with edge weights representing the likelihood that two points are connected. To do so, UMAP chooses a radius locally, based on the distance to each point's nearest neighbors. The likelihood of two points being connected is then exponentially decreasing with the ratio of the distance between the points and this radius.
- Once the high-dimensional graph is constructed, UMAP optimizes the layout of a low-dimensional analog to be as similar as possible.
- By stipulating that each point must be connected to at least its closest neighbor, UMAP ensures that local structure is preserved in balance with global structure.
- The following suboptions can be given:
"MinDistance" 0.1 minimum distance between points in low-dimensional space "NeighborsNumber" 15 number of nearest neighbors to construct the high-dimensional graph
- "MinDistance" controls how tightly UMAP clumps points together, with low values leading to more tightly packed embeddings. Larger values will make UMAP pack points together more loosely, focusing instead on the preservation of the broad topological structure.
- "NeighborsNumber" effectively controls how UMAP balances local versus global structure. Low values will push to focus more on local structure, while high values will push toward representing the big-picture structure while losing fine detail.
Examplesopen allclose all
Basic Examples (1)
Load the Fisher Iris dataset from ExampleData:
Generate a reducer function using the "UMAP" method:
Group the examples by their species:
Reduce the dimension of the features:
Visualize the reduced dataset:
Perform the same operation using a different number of nearest neighbors to construct the high-dimensional graph: