Supervised Machine Learning

Topic
Overview  »

Supervised machine learning is the attempt to classify data or predict outcomes using mathematical models trained on labeled datasets. It is used to solve problems such as score estimation (customer satisfaction, quality assessment, ), forecasting (prices, agricultural yields, ) and data classification (spam detection, copyrighted or violent content, ). The Wolfram Language has support for all the most common supervised learning algorithms, conveniently packaged into high-level functionality that deals automatically with tasks like missing data imputation, feature selection and extraction, model selection and cross-validation.

Classification

Classify classify data into categories using a built-in classifier or learning from examples

ClassifierFunction symbolic representation of a classifier to be applied to data

ClassifierMeasurements performance on test data

Regression

Predict predict values from data using a built-in predictor or learning from examples

PredictorFunction symbolic representation of a predictor to be applied to data

PredictorMeasurements performance on test data

Object Detection

TrainImageContentDetector, TrainTextContentDetector train custom detectors

ContentDetectorFunction symbolic representation of a detector to be applied to data

Sequences Forecasting

SequencePredict predict subsequent elements from sequence examples

SequencePredictorFunction symbolic representation of a sequence predictor

Learning from Actions

BayesianMinimization model-based minimization of arbitrary objective functions

ActiveClassification learn a classifier by actively probing a system

ActivePrediction learn a predictor by actively probing a system

ActiveClassificationObject  ▪  ActivePredictionObject

Specific Supervised Learning Methods

Nearest, NearestNeighborGraph find nearest neighbors

FindFit find a generalized nonlinear fit

LinearModelFit  ▪  LogitModelFit  ▪  NonlinearModelFit  ▪  GeneralizedLinearModelFit  ▪  ProbitModelFit

TimeSeriesModelFit fit a wide variety of types of time series

Interpolation find an interpolation of values in a dataset

FindFormula find a simple symbolic formula for data

FindSequenceFunction find a function to reproduce a discrete sequence

FindHiddenMarkovStates find the most probable path in a Markov model

Supervised Learning Methods »

"DecisionTree" use a decision tree

"LogisticRegression" use probabilities from linear combinations of features

"RandomForest" use BreimanCutler ensembles of decision trees

"SupportVectorMachine" classify using a support vector machine

"GradientBoostedTrees"  ▪  "NearestNeighbors"  ▪  "Markov"  ▪  ...

Machine Learning Options

AnomalyDetector how to detect anomalies in input data

ComputeUncertainty return values including uncertainty (as Around)

FeatureExtractor how to extract features to learn from

FeatureTypes feature types to assume for input data

MissingValuePattern specify how missing values are represented in data

MissingValueSynthesis how to synthesize missing values

PerformanceGoal whether to optimize for memory, quality or speed

RandomSeeding how to seed randomization

RecalibrationFunction how to post-process model predictions

TimeGoal how long to allocate for training etc.