WOLFRAM

Details & Suboptions

  • "SpanningTree" is a neighbor-based clustering method. "SpanningTree" works for arbitrary cluster shapes and sizes; however, it can fail when clusters have different densities or are loosely connected.
  • The following plots show the results of the "SpanningTree" method applied to toy datasets:
  • The algorithm finds the set of clusters for which neighboring clusters are the most distant from each other. The distance dij between two neighboring clusters i and j is defined as the distance between their closest points:
  • Formally, the "SpanningTree" method constructs the minimum spanning tree of data points (using distances as graph weights). The longest edges of the tree are then pruned. Each connected component corresponds to a cluster. The pruning stops when the specified number of clusters is reached. When the number of clusters is not specified, the pruning stops when all edges are shorter than a given threshold.
  • The option DistanceFunction can be used to define which distance to use.
  • The following suboption can be given:
  • "MaxEdgeLength" Automaticpruning length threshold

Examples

open allclose all

Basic Examples  (3)Summary of the most common use cases

Find clusters of nearby values using the "SpanningTree" method:

Out[1]=1
Out[1]=1

Train the ClassifierFunction using the "SpanningTree" method:

Out[2]=2

Obtain the cluster assignment and cluster the data:

Out[3]=3
Out[3]=3

Create random 2D vectors:

Out[1]=1

Plot clusters identified by the "SpanningTree" method:

Out[2]=2

Scope  (2)Survey of the scope of standard use cases

Find cluster indices using ClusteringComponents:

Out[1]=1

Create and visualize noisy 2D moon-shaped training and test datasets:

Out[1]=1

Train a ClassifierFunction using "SpanningTree" and find clusters in the test set:

Out[2]=2

Visualizing two intertwined clusters found by "SpanningTree":

Out[3]=3

Options  (2)Common values & functionality for each option

DistanceFunction  (1)

Cluster data using Manhattan distance:

Out[1]=1

"MaxEdgeLength"  (1)

Find clusters by specifying the "MaxEdgeLength" suboption:

Out[1]=1