"DBSCAN" (Machine Learning Method)
- Method for FindClusters, ClusterClassify and ClusteringComponents.
- Partitions data into clusters of similar elements using density-based spatial clustering of applications with noise (DBSCAN).
Details & Suboptions
- "DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
- The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):
- "DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
- This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:
- In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
- In FindClusters, noise points are returned as a cluster.
- The option DistanceFunction can be used to define which distance to use.
- The following suboptions can be given:
"NeighborhoodRadius" Automatic radius ϵ "NeighborsNumber" Automatic number of neighbors k "DropAnomalousValues" False whether to drop outliers
Examplesopen allclose all
Basic Examples (3)
Train the ClassifierFunction on a list of colors using the "DBSCAN" method:
Use the trained ClassifierFunction to identify the outliers:
Train the ClassifierFunction by dropping outliers and finding new cluster assignments:
Introduced in 2020