"DBSCAN" (Machine Learning Method)
- Method for FindClusters, ClusterClassify and ClusteringComponents.
- Partitions data into clusters of similar elements using density-based spatial clustering of applications with noise (DBSCAN).
Details & Suboptions
- "DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
- The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):
- "DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
- This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:
- In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
- In FindClusters, noise points are returned as a cluster.
- The option DistanceFunction can be used to define which distance to use.
- The following suboptions can be given:
-
"NeighborhoodRadius" Automatic radius ϵ "NeighborsNumber" Automatic number of neighbors k "DropAnomalousValues" False whether to drop outliers
Examples
open allclose allBasic Examples (3)
Find clusters of nearby values using the "DBSCAN" method:
Train the ClassifierFunction on a list of colors using the "DBSCAN" method:
Scope (2)
Obtain a random list of times:
Train the ClassifierFunction using the "DBSCAN" method:
Obtain the cluster assignment and cluster the data:
Train the ClassifierFunction using the "DBSCAN" method:
Noise points are labeled as Missing["Anomalous"]:
Options (7)
DistanceFunction (1)
"NeighborhoodRadius" (2)
Find clusters by specifying the "NeighborhoodRadius" suboption:
Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:
Plot clusters in data found using the "DBSCAN" method:
Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborhoodRadius":
"NeighborsNumber" (3)
Find clusters by specifying the "NeighborsNumber" suboption:
Plot clusters in data found using the "DBSCAN" method:
Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":
Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:
Plot clusters in data using the "DBSCAN" method:
Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":
"DropAnomalousValues" (1)
Train the ClassifierFunction, which labels outliers as Missing["Anomalous"]:
Use the trained ClassifierFunction to identify the outliers:
Train the ClassifierFunction by dropping outliers and finding new cluster assignments: