replaces missing values in each example by generated values.


uses the distribution dist to generate values.

Details and Options

  • SynthesizeMissingValues can be used on many types of data, including numerical, nominal and images.
  • Each examplei can be a single data element, a list of data elements or an association of data elements. Examples can also be given as a Dataset object.
  • The following options can be given:
  • FeatureNamesAutomaticfeature names to assign for input data
    FeatureTypesAutomaticfeature types to assume for input data
    MethodAutomaticwhich modeling algorithm to use
    MissingValuePattern_Missing|Indeterminatethe pattern of the assumed missing values
    PerformanceGoalAutomaticaspects of performance to optimize
    RandomSeeding1234what seeding of pseudorandom generators should be done internally
    TimeGoalAutomatichow long to spend for training
    TrainingProgressReportingAutomatichow to report progress during training
    ValidationSetAutomaticthe set of data on which to evaluate the model during training
  • Possible settings for PerformanceGoal include:
  • "Quality"maximize the synthesis quality
    "Speed"maximize the synthesis speed
    Automaticautomatic tradeoff between speed and quality
  • Possible settings for Method include:
  • Automaticautomatically choose distribution method and synthesis strategy
    Nonedo not use any missing synthesizer
    methoduse the specified method
    strategyhow to synthesize from the distribution
    assocspecify both distribution method and synthesis strategy
  • Possible settings for method include:
  • "Multinormal"use a multivariate normal (Gaussian) distribution
    "ContingencyTable"discretize data and store each possible probability
    "KernelDensityEstimation"use a kernel mixture distribution
    "DecisionTree"use a decision tree to compute probabilities
    "GaussianMixture"use a mixture of Gaussian (normal) distributions
  • Possible settings for strategy include:
  • "RandomSampling"randomly sample from the conditioned distribution (default)
    "ModeFinding"attempt to find the mode of the conditioned distribution
  • In the form Methodassoc, the association assoc should be of the form <|"LearningMethod"method,"EvaluationStrategy"strategy|>.
  • The following settings for TrainingProgressReporting can be used:
  • "Panel"show a dynamically updating graphical panel
    "Print"periodically report information using Print
    "ProgressIndicator"show a simple ProgressIndicator
    "SimplePanel"dynamically updating panel without learning curves
    Nonedo not report any information
  • Possible settings for RandomSeeding include:
  • Automaticautomatically reseed every time the function is called
    Inheriteduse externally seeded random numbers
    seeduse an explicit integer or strings as a seed
  • SynthesizeMissingValues[,FeatureExtractor"Minimal"] indicates that the internal preprocessing should be as simple as possible.


open allclose all

Basic Examples  (2)

Fill in missing values in a numeric dataset:

Train a distribution on a two-dimensional dataset:

Fill in missing values based on the learned distribution:

Scope  (1)

Specify that the missing values to be replaced are integers using MissingValuePattern:

Options  (6)

FeatureTypes  (1)

Specify that the first feature should be interpreted as a nominal variable, while the others should be determined automatically:

FeatureNames  (1)

Replace missing values and specify that the feature named "gender" should be considered nominal:

Method  (1)

Replace Missing[] values using "Multinormal" method for computing the distribution:

Use "KernelDensityEstimation" method for replacing the missing values:

Specify the method as an association, choosing the evaluation strategy and the learning method for computing the distribution:

MissingValuePattern  (1)

Replace values that should be assumed missing using MissingValuePattern:

Specify missing values with Condition in MissingValuePattern:

PerformanceGoal  (1)

Synthesize missing values by specifying the PerformanceGoal:

Compare the missing imputation time with the default PerformanceGoal:

TrainingProgressReporting  (1)

Print the training progress periodically during training:

Show training progress interactively without the plots:

Applications  (2)

Obtain a dataset of images:

Train a distribution on the images:

Replace the value that should be considered missing with the samples that are generated from the learned distribution:

Obtain a dataset related to features of moons of Jupiter that contains missing values:

Replace missing values in the dataset:

Wolfram Research (2019), SynthesizeMissingValues, Wolfram Language function,


Wolfram Research (2019), SynthesizeMissingValues, Wolfram Language function,


@misc{reference.wolfram_2021_synthesizemissingvalues, author="Wolfram Research", title="{SynthesizeMissingValues}", year="2019", howpublished="\url{}", note=[Accessed: 26-October-2021 ]}


@online{reference.wolfram_2021_synthesizemissingvalues, organization={Wolfram Research}, title={SynthesizeMissingValues}, year={2019}, url={}, note=[Accessed: 26-October-2021 ]}


Wolfram Language. 2019. "SynthesizeMissingValues." Wolfram Language & System Documentation Center. Wolfram Research.


Wolfram Language. (2019). SynthesizeMissingValues. Wolfram Language & System Documentation Center. Retrieved from