FindDistribution

FindDistribution[data]

finds a simple functional form to fit the distribution of data.

FindDistribution[data,n]

finds up to n best distributions.

FindDistribution[data,n,prop]

returns up to n best distributions associated with property prop.

FindDistribution[data,n,{prop1,prop2,}]

returns up to n best distributions associated with properties prop1, prop2, etc.

Details and Options

Examples

open allclose all

Basic Examples  (2)

Create a list of uniformly distributed random integers:

Find the underlying distribution from the data:

Generate data sampled from an exponential distribution:

Find the best distribution from the data:

Compare the PDFs for the original and estimated distributions:

Return the best three distributions:

Compare their Bayesian information criterion and Akaike information criterion values:

Scope  (3)

Generate data sampled from a mixture distribution:

Estimate the best distribution from this data:

Compare the PDFs for the original and estimated distributions:

Estimate parameters for a particular distribution:

By default, FindDistribution returns a simpler distribution:

Specify the type of distribution to look for:

Generate data sampled from an exponential distribution:

Generate a Dataset object containing all properties for the top 2 distributions:

Options  (5)

TargetFunctions  (3)

Generate data samples from a mixture distribution:

Estimate parameters for specific distributions:

Compare the PDFs for the original and estimated distributions:

Time between geyser eruptions:

Estimate the distribution of the data:

Estimate the distribution of the data when treated as continuous:

Estimate the distribution of the data when treated as continuous using GammaDistribution:

Compare the histogram of the data to the PDF of the estimated distributions:

Estimate parameters for specific distributions, assuming priors over them:

The magnitudes of earthquakes in the United States in the years 19351989 have two modes:

Estimate the best fit without using TargetFunctions:

Estimate the best fit using priors over distributions:

Compare the histogram to the PDFs of the estimated distributions:

PerformanceGoal  (1)

Generate data samples from a mixture distribution:

Estimate the best fit for a big dataset and compare the AbsoluteTiming for different settings of PerformanceGoal:

Compare the LogLikelihood of the solutions:

RandomSeeding  (1)

Generate data samples from a mixture distribution:

Compare different rounds of FindDistribution and notice how they differ:

Use the option RandomSeeding to avoid having different results:

Applications  (5)

Lengths of Words Beginning with a Particular Letter  (1)

Lengths of all English words in a dictionary that begin with different vowels:

Estimate the distribution for different vowels:

Compare the histograms of the original data to the PDFs of the estimated distributions:

Text Frequency  (1)

Count the number of occurrences of words in the Declaration of Independence:

Estimate the distribution of the word count:

Compare the histograms of the original data to the PDF of the estimated distribution:

Melanoma in Denmark  (1)

Age of patients affected by melanoma:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Infection Time for AIDS  (1)

Infection time for AIDS in years:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Time to Kidney Infection after Catheter Replacement  (1)

Time to kidney infection in months:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Properties & Relations  (1)

By default, FindDistributionParameters uses maximum likelihood to estimate distribution parameters for a fixed distribution. FindDistribution uses a full Bayesian approach by combining the Bayesian information criterion with priors over distributions to select both the best distribution and the best parameters for it.

Generate data sampled from a StudentTDistribution:

Use FindDistribution to estimate the best distribution that fits the data:

Use FindDistributionParameters to estimate the best parameters, assuming a StudentTDistribution:

Even though the StudentTDistribution minimized the log likelihood, the LogisticDistribution has larger prior and smaller complexity compared to it.

Compare the corresponding LogLikelihood:

The option TargetFunctions can be used if you want to find roughly the same parameters as FindDistributionParameters:

Introduced in 2015
 (10.3)
 |
Updated in 2017
 (11.2)