Unsupervised Machine Learning
TopicOverview »
Unsupervised machine learning is the attempt to analyze untagged data and discover hidden relationships. It finds hidden patterns, clusters of similar examples, underlying data distributions or simpler data representations. Common use cases are disease diagnosis, market basket analysis, consumer grouping and anomaly detection. Unsupervised machine learning also helps with data visualization. The Wolfram Language offers a large collection of unsupervised learning methods, accessible via goal-based functions that automate a large part of the processing pipeline (feature selection and extraction, model selection and cross-validation, …) and make possible analysis of all kinds of data such as text, images or graphs, beyond just the standard arrays.
Cluster Analysis »
ClusterClassify — classify data into clusters
ClusteringMeasurements — analyze the result of a clustering process
FindClusters ▪ ClusteringTree ▪ ClusteringComponents ▪ ...
Feature Extraction
FeatureExtraction — find how to extract features from data
FeatureExtract ▪ FeatureExtractorFunction ▪ FeatureNearest
Anomaly Detection
AnomalyDetection — learn an anomaly detector function from data
FindAnomalies ▪ DeleteAnomalies ▪ AnomalyDetectorFunction ▪ RarerProbability
Dimensionality Reduction
DimensionReduction — find how to project data onto lower-dimensional space
DimensionReduce ▪ DimensionReducerFunction
Distribution Modeling
LearnDistribution — learn the underlying distribution of any type of data
FindDistribution — find a representation for data in terms of named distributions
SynthesizeMissingValues — fill in missing values by imputing from existing data
Visualization
FeatureSpacePlot — visualize dimension-reduced feature space in 2D
FeatureSpacePlot3D — visualize dimension-reduced feature space in 3D
FeatureValueImpactPlot — plot the impact of a feature value on a model result
FeatureImpactPlot — plot the impact of each feature together
CumulativeFeatureImpactPlot — plot the cumulative impact of each feature
FeatureValueDependencyPlot — plot the result dependency on a feature value
Dendrogram — visualize hierarchical clusters
Specific Unsupervised Learning Methods
FindGraphCommunities — find communities or clusters in graphs
SmoothKernelDistribution — find kernel density estimates for data
FindHiddenMarkovStates — infer hidden Markov states from a sequence of data
Eigensystem ▪ SingularValueDecomposition ▪ PrincipalComponents ▪ KarhunenLoeveDecomposition
Unsupervised Learning Methods »
"GaussianMixture" — use a mixture of Gaussian (normal) distributions
"KernelDensityEstimation" — use a kernel mixture distribution
"Multinormal" — use a multivariate normal (Gaussian) distribution