Wolfram Research, Inc.

3.2.14 Statistical Distributions and Related Functions

There are standard Mathematica packages for evaluating functions related to common statistical distributions. Mathematica represents the statistical distributions themselves in the symbolic form name[, , ... ], where the are parameters for the distributions. Functions such as Mean, which give properties of statistical distributions, take the symbolic representation of the distribution as an argument.

Statistical distributions from the package Statistics`ContinuousDistributions`.

Most of the continuous statistical distributions commonly used are derived from the normal or Gaussian distribution NormalDistribution[, ]. This distribution has probability density . If you take random variables that follow any distribution with bounded variance, then the Central Limit Theorem shows that the mean of a large number of these variables always approaches a normal distribution.

The logarithmic normal distribution or lognormal distribution LogNormalDistribution[, ] is the distribution followed by the exponential of a normal-distributed random variable. This distribution arises when many independent random variables are combined in a multiplicative fashion.

The chi-square distribution ChiSquareDistribution[n] is the distribution of the quantity , where the are random variables which follow a normal distribution with mean zero and unit variance. The chi-square distribution gives the distribution of variances of samples from a normal distribution.

The Student t distribution StudentTDistribution[n] is the distribution followed by the ratio of a variable that follows the normal distribution to the square root of one that follows the chi-square distribution with degrees of freedom. The distribution characterizes the uncertainty in a mean when both the mean and variance are obtained from data.

The F-ratio distribution, F-distribution or variance ratio distribution FRatioDistribution[, ] is the distribution of the ratio of two chi-square variables with and degrees of freedom. The -ratio distribution is used in the analysis of variance for comparing variances from different models.

The extreme value distribution ExtremeValueDistribution[, ] is the limiting distribution for the smallest or largest values in large samples drawn from a variety of distributions, including the normal distribution.

Functions of statistical distributions.

The cumulative distribution function (cdf) CDF[dist, x] is given by the integral of the probability density function for the distribution up to the point . For the normal distribution, the cdf is usually denoted . Cumulative distribution functions are used in evaluating probabilities for statistical hypotheses. For discrete distributions, the cdf is given by the sum of the probabilities up to the point . The cdf is sometimes called simply the distribution function. The cdf at a particular point for a given distribution is often denoted , where the are parameters of the distribution. The upper tail area is given in terms of the cdf by . Thus, for example, the upper tail area for a chi-square distribution with degrees of freedom is denoted and is given by 1 - CDF[ChiSquareDistribution[nu], chi2].

The quantile Quantile[dist, q] is effectively the inverse of the cdf. It gives the value of x at which CDF[dist, x] reaches q. The median is given by Quantile[dist, 1/2]; quartiles, deciles and percentiles can also be expressed as quantiles. Quantiles are used in constructing confidence intervals for statistical parameter estimates.

The characteristic function CharacteristicFunction[dist, t] is given by , where is the probability density for a distribution. The central moment of a distribution is given by the derivative .

Random[dist] gives pseudorandom numbers that follow the specified distribution. The numbers can be seeded as discussed in Section 3.2.3.

This loads the package which defines continuous statistical distributions.

In[1]:= <<Statistics`ContinuousDistributions`

This represents a normal distribution with mean zero and unit variance.

In[2]:= ndist = NormalDistribution[0, 1]

Out[2]=

Here is a symbolic result for the cumulative distribution function of the normal distribution.

In[3]:= CDF[ndist, x]

Out[3]=

This gives the value of at which the cdf of the normal distribution reaches the value .

In[4]:= Quantile[ndist, 0.9] // N

Out[4]=

Here is a list of five normal-distributed pseudorandom numbers.

In[5]:= Table[ Random[ndist], {5} ]

Out[5]=

Statistical distributions from the package Statistics`DiscreteDistributions`.

Most of the common discrete statistical distributions can be derived by considering a sequence of "trials", each with two possible outcomes, say "success" and "failure".

The Bernoulli distribution BernoulliDistribution[p] is the probability distribution for a single trial in which success, corresponding to value 1, occurs with probability , and failure, corresponding to value 0, occurs with probability .

The binomial distribution BinomialDistribution[n, p] is the distribution of the number of successes that occur in independent trials when the probability for success in an individual trial is . The distribution is given by .

The negative binomial distribution NegativeBinomialDistribution[r, p] gives the distribution of the number of failures that occur in a sequence of trials before successes have occurred, given that the probability for success in each individual trial is .

The geometric distribution GeometricDistribution[p] gives the distribution of the total number of trials before the first success occurs in a sequence of trials where the probability for success in each individual trial is .

The hypergeometric distribution HypergeometricDistribution[n, , ] is used in place of the binomial distribution for experiments in which the trials correspond to sampling without replacement from a population of size with potential successes.

The discrete uniform distribution DiscreteUniformDistribution[n] represents an experiment with outcomes that occur with equal probabilities.