"UMAP" (Machine Learning Method)

Details & Suboptions

  • "UMAP", which stands for uniform manifold approximation and projection, is a nonlinear nonparametric dimensionality reduction method. The method attempts to learn a low-dimensional representation of the data that preserves the local structure of the data in balance with the global structure.
  • "UMAP" works for datasets with nonlinear manifolds and is particularly suited for the visualization of high-dimensional datasets.
  • The following shows two-dimensional embeddings learned by the "UMAP" method applied to the benchmark datasets Fisher's Irises, MNIST and FashionMNIST:
  • UMAP constructs a high-dimensional graph representation of the data then optimizes a low-dimensional graph to be as structurally similar as possible.
  • In order to construct the initial high-dimensional graph, UMAP builds a weighted graph, with edge weights representing the likelihood that two points are connected. To do so, UMAP chooses a radius locally, based on the distance to each point's nearest neighbors. The likelihood of two points being connected is then exponentially decreasing with the ratio of the distance between the points and this radius.
  • Once the high-dimensional graph is constructed, UMAP optimizes the layout of a low-dimensional analog to be as similar as possible.
  • By stipulating that each point must be connected to at least its closest neighbor, UMAP ensures that local structure is preserved in balance with global structure.
  • The following suboptions can be given:
  • "MinDistance" 0.1minimum distance between points in low-dimensional space
    "NeighborsNumber" 15number of nearest neighbors to construct the high-dimensional graph
  • "MinDistance" controls how tightly UMAP clumps points together, with low values leading to more tightly packed embeddings. Larger values will make UMAP pack points together more loosely, focusing instead on the preservation of the broad topological structure.
  • "NeighborsNumber" effectively controls how UMAP balances local versus global structure. Low values will push to focus more on local structure, while high values will push toward representing the big-picture structure while losing fine detail.

Examples

open allclose all

Basic Examples  (1)

Reduce the dimension of some images using the "UMAP" method:

Visualize the two-dimensional representation of images:

Options  (2)

"MinDistance"  (1)

Load a sample from the "MNIST" dataset:

Reduce the dimension of images using "UMAP":

Find features by performing a linear reduction before running the UMAP method using the "MinDistance" suboption:

Visualize the obtained features and compare the results:

"NeighborsNumber"  (1)

Load the Fisher Iris dataset from ExampleData:

Generate a reducer function using the "UMAP" method:

Group the examples by their species:

Reduce the dimension of the features:

Visualize the reduced dataset:

Perform the same operation using a different number of nearest neighbors to construct the high-dimensional graph:

Applications  (1)

Data Visualization  (1)

Reduce the dimension of some images using the "UMAP" method:

Visualize the two-dimensional representation of images: