FindDistribution

FindDistribution[data]

finds a simple functional form to fit the distribution of data.

FindDistribution[data,n]

finds up to n best distributions.

FindDistribution[data,n,prop]

returns up to n best distributions associated with property prop.

FindDistribution[data,n,{prop₁,prop₂,…}]

returns up to n best distributions associated with properties prop₁, prop₂, etc.

Details and Options

The data must be a list of possible outcomes from a univariate distribution.
FindDistribution[data,n,All] creates a Dataset object with all possible properties.
Properties supported include:

	"BIC"	Bayesian information criterion
	"AIC"	Akaike information criterion
	"HQIC"	Hannan–Quinn information criterion
	"Score"	internal score
	"Complexity"	complexity of the distribution
	"LogLikelihood"	LogLikelihood value
	"PearsonChiSquare"	PearsonChiSquareTest p-value
	"CramerVonMises"	CramerVonMisesTest p-value
	All	all the previous properties

The following options can be given:

MaxItems	Infinity	maximum number of distributions in a mixture distribution
PerformanceGoal	Automatic	aspect of performance to optimize
RandomSeeding	Automatic	what seeding of pseudorandom generators should be done internally
TargetFunctions	Automatic	types of distributions to consider

Possible settings for PerformanceGoal include:
"Speed" minimize the time spent to find distributions

"Quality" try to find better distributions
Possible settings for TargetFunctions include:

	Automatic	automatically chosen distributions
	All	all built-in distributions
	"Continuous"	all continuous distributions
	"Discrete"	all discrete distributions
	{dist₁,,}	distributions dist_i
	{ ${w_(1),w_(2),...}$ {dist₁,,}}	distributions dist_i using weights w_i

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Possible continuous distributions for TargetFunctions are: BetaDistribution, CauchyDistribution, ChiDistribution, ChiSquareDistribution, ExponentialDistribution, ExtremeValueDistribution, FrechetDistribution, GammaDistribution, GumbelDistribution, HalfNormalDistribution, InverseGaussianDistribution, LaplaceDistribution, LevyDistribution, LogisticDistribution, LogNormalDistribution, MaxwellDistribution, NormalDistribution, ParetoDistribution, RayleighDistribution, StudentTDistribution, UniformDistribution, WeibullDistribution, HistogramDistribution.
Possible discrete distributions for TargetFunctions are: BenfordDistribution, BinomialDistribution, BorelTannerDistribution, DiscreteUniformDistribution, GeometricDistribution, LogSeriesDistribution, NegativeBinomialDistribution, PascalDistribution, PoissonDistribution, WaringYuleDistribution, ZipfDistribution, HistogramDistribution, EmpiricalDistribution.
The internal information criterion uses a Bayesian information criterion together with priors over TargetFunctions.

Examples

open allclose all

Basic Examples (2)

Create a list of uniformly distributed random integers:

Find the underlying distribution from the data:

Generate data sampled from an exponential distribution:

Find the best distribution from the data:

Compare the PDFs for the original and estimated distributions:

Return the best three distributions:

Compare their Bayesian information criterion and Akaike information criterion values:

Scope (3)

Generate data sampled from a mixture distribution:

Estimate the best distribution from this data:

Compare the PDFs for the original and estimated distributions:

Estimate parameters for a particular distribution:

By default, FindDistribution returns a simpler distribution:

Specify the type of distribution to look for:

Generate data sampled from an exponential distribution:

Generate a Dataset object containing all properties for the top 2 distributions:

Options (5)

TargetFunctions (3)

Generate data samples from a mixture distribution:

Estimate parameters for specific distributions:

Compare the PDFs for the original and estimated distributions:

Time between geyser eruptions:

Estimate the distribution of the data:

Estimate the distribution of the data when treated as continuous:

Estimate the distribution of the data when treated as continuous using GammaDistribution:

Compare the histogram of the data to the PDF of the estimated distributions:

Estimate parameters for specific distributions, assuming priors over them:

The magnitudes of earthquakes in the United States in the years 1935–1989 have two modes:

Estimate the best fit without using TargetFunctions:

Estimate the best fit using priors over distributions:

Compare the histogram to the PDFs of the estimated distributions:

PerformanceGoal (1)

Generate data samples from a mixture distribution:

Estimate the best fit for a big dataset and compare the AbsoluteTiming for different settings of PerformanceGoal:

Compare the LogLikelihood of the solutions:

RandomSeeding (1)

Generate data samples from a mixture distribution:

Compare different rounds of FindDistribution and notice how they differ:

Use the option RandomSeeding to avoid having different results:

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Lengths of all English words in a dictionary that begin with different vowels:

Estimate the distribution for different vowels:

Compare the histograms of the original data to the PDFs of the estimated distributions:

Text Frequency (1)

Count the number of occurrences of words in the Declaration of Independence:

Estimate the distribution of the word count:

Compare the histograms of the original data to the PDF of the estimated distribution:

Melanoma in Denmark (1)

Age of patients affected by melanoma:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Infection Time for AIDS (1)

Infection time for AIDS in years:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Time to Kidney Infection after Catheter Replacement (1)

Time to kidney infection in months:

Estimate the distribution of the data:

Compare the histogram of the data to the PDF of the estimated distribution:

Properties & Relations (1)

By default, FindDistributionParameters uses maximum likelihood to estimate distribution parameters for a fixed distribution. FindDistribution uses a full Bayesian approach by combining the Bayesian information criterion with priors over distributions to select both the best distribution and the best parameters for it.

Generate data sampled from a StudentTDistribution:

Use FindDistribution to estimate the best distribution that fits the data:

Use FindDistributionParameters to estimate the best parameters, assuming a StudentTDistribution:

Even though the StudentTDistribution minimized the log likelihood, the LogisticDistribution has larger prior and smaller complexity compared to it.

Compare the corresponding LogLikelihood:

The option TargetFunctions can be used if you want to find roughly the same parameters as FindDistributionParameters:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FindDistribution

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (5)

TargetFunctions (3)

PerformanceGoal (1)

RandomSeeding (1)

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Text Frequency (1)

Melanoma in Denmark (1)

Infection Time for AIDS (1)

Time to Kidney Infection after Catheter Replacement (1)

Properties & Relations (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"Speed"	minimize the time spent to find distributions
	"Quality"	try to find better distributions

FindDistribution

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (5)

TargetFunctions (3)

PerformanceGoal (1)

RandomSeeding (1)

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Text Frequency (1)

Melanoma in Denmark (1)

Infection Time for AIDS (1)

Time to Kidney Infection after Catheter Replacement (1)

Properties & Relations (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX