LearnDistribution

LearnDistribution[{example1,example2,}]

generates a LearnedDistribution[] that attempts to represent an underlying distribution for the examples given.

Details and Options

  • LearnDistribution can be used on many types of data, including numerical, nominal and images.
  • Each examplei can be a single data element, a list of data elements or an association of data elements. Examples can also be given as a Dataset object.
  • LearnDistribution effectively assumes that each of the examplei are independently drawn from an underlying distribution, which LearnDistribution attempts to infer.
  • LearnDistribution[examples] yields a LearnedDistribution[] on which the following functions can be used:
  • PDF[dist,]probability or probability density for data
    RandomVariate[dist]random samples generated from the distribution
    SynthesizeMissingValues[dist,]fill in missing values according to the distribution
    RarerProbability[dist,]compute the probability to generate a sample with lower PDF than a given example
  • The following options can be given:
  • FeatureExtractorIdentityhow to extract features from which to learn
    FeatureNamesAutomaticfeature names to assign for input data
    FeatureTypesAutomaticfeature types to assume for input data
    MethodAutomaticwhich modeling algorithm to use
    PerformanceGoalAutomaticaspects of performance to try to optimize
    RandomSeeding1234what seeding of pseudorandom generators should be done internally
    TimeGoalAutomatichow long to spend training the classifier
    TrainingProgressReportingAutomatichow to report progress during training
    ValidationSetAutomaticthe set of data on which to evaluate the model during training
  • Possible settings for PerformanceGoal include:
  • "DirectTraining"train directly on the full dataset, without model searching
    "Memory"minimize storage requirements of the distribution
    "Quality"maximize the modeling quality of the distribution
    "Speed"maximize speed for PDF queries
    "SamplingSpeed"maximize speed for generating random samples
    "TrainingSpeed"minimize time spent producing the distribution
    Automaticautomatic tradeoff among speed, quality and memory
    {goal1,goal2,}automatically combine goal1, goal2, etc.
  • Possible settings for Method include:
  • "Multinormal"use a multivariate normal (Gaussian) distribution
    "ContingencyTable"discretize data and store each possible probability
    "KernelDensityEstimation"use a kernel mixture distribution
    "DecisionTree"use a decision tree to compute probabilities
    "GaussianMixture"use a mixture of Gaussian (normal) distributions
  • The following settings for TrainingProgressReporting can be used:
  • "Panel"show a dynamically updating graphical panel
    "Print"periodically report information using Print
    "ProgressIndicator"show a simple ProgressIndicator
    "SimplePanel"dynamically updating panel without learning curves
    Nonedo not report any information
  • Possible settings for RandomSeeding include:
  • Automaticautomatically reseed every time the function is called
    Inheriteduse externally seeded random numbers
    seeduse an explicit integer or strings as a seed
  • Only reversible feature extractors can be given in the option FeatureExtractor.
  • LearnDistribution[,FeatureExtractor"Minimal"] indicates that the internal preprocessing should be as simple as possible.
  • All images are first conformed using ConformImages.
  • Information[LearnedDistribution[]] generates an information panel about the distribution and its estimated performances.

Examples

open all close all

Basic Examples  (3)

Train a distribution on a numeric dataset:

In[1]:=
Click for copyable input
Out[1]=

Generate a new example based on the learned distribution:

In[2]:=
Click for copyable input
Out[2]=

Compute the PDF of a new example:

In[3]:=
Click for copyable input
Out[3]=

Train a distribution on a nominal dataset:

In[1]:=
Click for copyable input
Out[1]=

Generate a new example based on the learned distribution:

In[2]:=
Click for copyable input
Out[2]=

Compute the probability of the examples "A" and "B":

In[3]:=
Click for copyable input
Out[3]=

Train a distribution on a two-dimensional dataset:

In[1]:=
Click for copyable input
Out[1]=

Generate a new example based on the learned distribution:

In[2]:=
Click for copyable input
Out[2]=

Compute the probability of two examples:

In[3]:=
Click for copyable input
Out[3]=

Impute the missing value of an example:

In[4]:=
Click for copyable input
Out[4]=

Scope  (3)

Options  (6)

Applications  (4)

Introduced in 2019
(12.0)