Statistics`MultinormalDistribution`
The most commonly used probability distributions for multivariate data analysis are those derived from the multinormal (multivariate Gaussian) distribution. This package contains multinormal, multivariate Student , Wishart, Hotelling , and quadratic form distributions.
Distributions are usually represented in the symbolic form name[, , ... ]. When there are many parameters, they may be organized into lists, as in the case of QuadraticFormDistribution. Functions such as Mean, which give properties of statistical distributions, take the symbolic representation of the distribution as an argument.
Standard probability distributions derived from the multivariate Gaussian distribution.
A variate multinormal distribution with mean vector and covariance matrix is denoted . If , , is distributed (where is the zero vector), and X denotes the data matrix composed of the row vectors , then the matrix has a Wishart distribution with scale matrix and degrees of freedom parameter , denoted . The Wishart distribution is most typically used when describing the covariance matrix of multinormal samples.
A vector that has a multivariate Student t distribution can also be written as a function of a multinormal random vector. Let be a standardized multinormal vector with covariance matrix and let be a chisquare variable with degrees of freedom. (Note that since is standardized, is the mean vector of and is also the correlation matrix of .) Then has a multivariate distribution with correlation matrix and degrees of freedom, denoted . The multivariate Student distribution is elliptically contoured like the multinormal distribution, and characterizes the ratio of a multinormal vector to the standard deviation common to each variate. When and , the multivariate distribution is the same as the multivariate Cauchy distribution (here denotes the identity matrix).
The Hotelling T distribution is a univariate distribution proportional to the Fratio distribution. If vector and matrix are independently distributed and , then has the Hotelling distribution with parameters and , denoted . This distribution is commonly used to describe the sample Mahalanobis distance between two populations.
A quadratic form in a multinormal vector distributed is given by , where is a symmetric matrix, is a vector, and is a scalar. This univariate distribution can be useful in discriminant analysis of multinormal samples.
Functions of univariate statistical distributions applicable to multivariate distributions.
In this package distributions are represented in symbolic form. Generally, PDF[dist, x] evaluates the density at if is a numerical value, vector, or matrix, and otherwise leaves the function in symbolic form. Similarly, CDF[dist, x] gives the cumulative density and CharacteristicFunction[dist, t] gives the characteristic function of the specified distribution.
In some cases explicit forms of these expressions are not available. For example, PDF[QuadraticFormDistribution[A, b, c, mu, sigma], x] does not evaluate, but a Series expansion of the PDF about the lower support point of the domain (for a positive definite quadratic form) does evaluate. The CDF of MultinormalDistribution and StudentTDistribution is available for numerical vector arguments, but not for symbolic vector arguments. In the case of MultivariateTDistribution, the CharacteristicFunction is expressed in terms of an integral.
There are limitations on the covariance matrix in CDF[MultinormalDistribution[mu,sigma],x] and the correlation matrix in CDF[MultivariateTDistribution[r, m], x]. The matrix must be of the form for and for , where . Similarly, the matrix must be of the form for and for , where . A reference for this method of calculating the cumulative distribution function may be found in Y. L. Tong, The Multivariate Normal Distribution, SpringerVerlag, 1990.
Note that for a vectorvalued distribution such as MultinormalDistribution or MultivariateTDistribution, functions like Mean, Variance, and Kurtosis give a vectorvalued result since they are applied to each coordinate of the vector. Similarly, for a matrixvalued distribution, such as WishartDistribution, these functions give a matrixvalued result.
This loads the package.
In[1]:= <<Statistics`MultinormalDistribution`
Here is a symbolic representation of a standardized binormal distribution. A standardized random vector has a zero mean vector and a covariance matrix equal to its correlation matrix.
In[2]:= (r = {{1, 1/Sqrt[3]}, {1/Sqrt[3], 1}}; ndist = MultinormalDistribution[{0, 0}, r])
Out[2]=
This gives its probability density function.
In[3]:= pdf = PDF[ndist, {x1, x2}]
Out[3]=
You can make a plot of the density to observe its distribution.
In[4]:= Plot3D[pdf, {x1, 3, 3}, {x2, 3, 3}, PlotRange>All]
Out[4]=
Here is the probability of the distribution in the region .
In[5]:= CDF[ndist, {1, 1}]
Out[5]=
This gives the domain of the quadratic form distribution qdist.
In[6]:= (qdist = QuadraticFormDistribution[{{{8, 4}, {4, 3}}, {2, 1}, 6}, {{1, 1}, {{1, 1}, {1, 2}}}]; Domain[qdist])
Out[6]=
The series expansion of the PDF of the quadratic form distribution can be plotted. A 20term expansion is clearly poor for .
In[7]:= (polynomial = Normal[Series[PDF[qdist, x], {x, 47/8, 20}]]; Plot[polynomial, {x, 47/8, 50}])
Out[7]=
Many of the multivariate distributions have hidden arguments that are evaluated when the distribution is first entered. Random variate generation will be more efficient if these arguments are evaluated only once.
This is an inefficient means of computing 1000 multinormal variates because the Cholesky decomposition of the covariance matrix is computed for each variate.
In[8]:= (mu = {1, 2, 3, 4}; sigma = {{1, 1/2, 1/3, 1/4}, {1/2, 1/3, 1/4, 1/5}, {1/3, 1/4, 1/5, 1/6}, {1/4, 1/5, 1/6, 1/7}}; Timing[Table[Random[MultinormalDistribution[mu, sigma]], {1000}]][[1]])
Out[8]=
This method of generating 1000 variates is more efficient because the Cholesky decomposition is computed once.
In[9]:= Timing[RandomArray[ MultinormalDistribution[mu, sigma], 1000]][[1]]
Out[9]=
Functions of univariate statistical distributions not applicable to multivariate distributions.
In the multivariate case, it is difficult to define Quantile as the inverse of the CDF function, since many values of the random vector (or random matrix) correspond to a single probability value. This package defines Quantile only for the univariate distribution HotellingTSquareDistribution and some minor degenerate cases of the other distributions. The ellipticallycontoured distributions MultinormalDistribution and MultivariateTDistribution support EllipsoidQuantile and its inverse RegionProbability.
Functions of vectorvalued multivariate statistical distributions.
This gives the ellipse centered on the mean that encloses 50% of the ndist distribution.
In[10]:= ellipse = EllipsoidQuantile[ndist, .5]
Out[10]=
This gives the probability of the distribution within the ellipse. Note that the ellipse must correspond to a constantprobability contour of the prescribed distribution.
In[11]:= RegionProbability[ndist, ellipse]
Out[11]=
As , the elliptical contour of MultivariateTDistribution[m, r] approaches the elliptical contour of a multinormal distribution with zero mean vector and covariance matrix .
In[12]:= Show[Graphics[{ellipse, {Dashing[{.04, .02}], EllipsoidQuantile[ MultivariateTDistribution[r, 2], .5]}, {Dashing[{.02, .04}], EllipsoidQuantile[ MultivariateTDistribution[r, 1], .5]}} ], Axes>True]
Out[12]=
