This is documentation for Mathematica 5, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.1)

Documentation / Mathematica / Add-ons & Links / Standard Packages / Introduction /

Statistics Packages

Mathematica users from a wide range of disciplines count on the basic statistical functions provided in the standard add-on packages to do their work. Continuous and discrete univariate and multivariate statistical distributions are supported. There are packages offering descriptive statistics for univariate and multivariate data, data manipulation and smoothing, and classical hypothesis testing and confidence interval estimation. Both linear and nonlinear regression functions permit users to take advantage of an extensive collection of diagnostics.

This causes each Statistics package to be loaded as needed.

In[1]:= <<Statistics`

About of this simulated data derives from a binormal distribution with nearly circular contours located at the origin. The remainder derives from a binormal having highly elliptical contours located in the lower left quadrant.

In[2]:= (dist0 = MultinormalDistribution[{0, 0},
{{1, .1}, {.1, 1}}];
dist1 = MultinormalDistribution[{-3, -2},
{{3.2, -2.9}, {-2.9, 2.8}}];
ListPlot[data = Table[If[Random[] < .9,
Random[dist0], Random[dist1]], {100}],
AspectRatio->1, PlotRange -> All])


The population distribution is bimodal.

In[3]:= ({{xmin, xmax}, {ymin, ymax}} = Map[{Min[#], Max[#]}&,
ContourPlot[.9 PDF[dist0, {x, y}] +
.1 PDF[dist1, {x, y}],
{x, xmin, xmax}, {y, ymin, ymax},
ContourShading -> False])


You can visualize the sample distribution by assuming a binormal form and using the sample covariance matrix. But this hides the bimodal nature of the data.



In the previous example, the observations followed a bivariate statistical distribution. You can also simulate data where the value of the independent variable is known with relative certainty, while the measured response is assumed to follow a univariate statistical distribution.

The nonrandom component of the response is a sum of Gaussians, one centered at and another centered at .



Here is a plot of the data, two Gaussian peaks contaminated by noise (uniformly distributed between and ), perhaps simulating mass spectrometer measurements. From the plot you might guess that there is a peak near and another near .

In[6]:= (data = Table[{x, peaks + Random[Real, 1.5]},
{x, 1, 30}];
dataplot = ListPlot[data])


A nonlinear model is chosen: a sum of Gaussians with different locations (, ) and amplitudes (, ), but the same width ().



A nonlinear model requires good starting values for the parameters to initialize the search for the least-squares estimates. These can be guessed at by looking at the plot. Here is the model at our initial guess for the parameter values.



The "true" values for , , , and (, , , ) lie within the confidence intervals NonlinearRegress provides for the parameter estimates. The true value for () falls outside the interval.



Diagnostics provide an evaluation of the least-squares fit. Here the maximum relative intrinsic curvature is smaller than the relative curvature of the confidence region, but not by much. The data analyst should be skeptical about whether the intrinsic form of the chosen model is appropriate. The maximum relative parameter-effects curvature is greater than the threshold, indicating that the model parametrization could be improved.

Recall that the error in the simulated data was uniformly distributed between and , so there is actually a background constant term of unaccounted for in the model. Including the background in the model should yield a better fit.

In[10]:= (trueplot = Plot[peaks, {x, 1, 30},
DisplayFunction -> Identity];
estimatedplot = Plot[(model /. (BestFitParameters /. regress)),
{x, 1, 30}, DisplayFunction -> Identity];
Show[dataplot, trueplot, estimatedplot])