This is documentation for Mathematica 8, which was
based on an earlier version of the Wolfram Language.

# PearsonChiSquareTest

 PearsonChiSquareTest[data] tests whether data is normally distributed using the Pearson test. PearsonChiSquareTest tests whether data is distributed according to dist using the Pearson test. PearsonChiSquareTest returns the value of .
• PearsonChiSquareTest performs the Pearson goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist, and alternative hypothesis that it was not.
• By default, a probability value or -value is returned.
• A small -value suggests that it is unlikely that the data came from dist.
• The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
• The data can be univariate or multivariate .
• The Pearson test effectively compares a histogram of data to a theoretical histogram based on dist.
• For univariate data the test statistic is given by , where and are the observed and expected counts for the histogram bin, respectively.
• For multivariate tests, the mean of the univariate marginal test statistics is used. -values are computed via Monte Carlo simulation.
• Properties related to the reporting of test results include:
 "DegreesOfFreedom" the degrees of freedom used in a test "PValue" -value "PValueTable" formatted version of "ShortTestConclusion" a short description of the conclusion of a test "TestConclusion" a description of the conclusion of a test "TestData" test statistic and -value "TestDataTable" formatted version of "TestStatistic" test statistic "TestStatisticTable" formatted
• The following properties are independent of which test is being performed.
• Properties related to the data distribution include:
 "FittedDistribution" fitted distribution of data "FittedDistributionParameters" distribution parameters of data
• The following options can be given:
 Method Automatic the method to use for computing -values SignificanceLevel 0.05 cutoff for diagnostics and reporting
• For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the and properties is controlled by the SignificanceLevel option. By default, is set to .
Perform the Pearson test for normality:
Test the fit of some data to a particular distribution:
Compare the distributions of two datasets:
Extract the test statistic from the Pearson test:
Perform the Pearson test for normality:
 Out[2]=

Test the fit of some data to a particular distribution:
 Out[2]=

Compare the distributions of two datasets:
 Out[3]=

Extract the test statistic from the Pearson test:
 Out[2]=
 Scope   (9)
Perform a Pearson test for normality:
The -value for the normal data is large compared to the -value for the non-normal data:
Test the goodness of fit to a particular distribution:
Compare the distributions of two datasets:
The two datasets do not have the same distribution:
Test for multivariate normality:
Test for goodness of fit to any multivariate distribution:
Create a HypothesisTestData object for repeated property extraction:
The properties available for extraction:
Tabulate the results of the Pearson test:
The full test table:
A -value table:
The test statistic:
Retrieve the entries from a Pearson test table for custom reporting:
Report test conclusions using and :
The conclusion may differ at a different significance level:
 Options   (3)
Use Monte Carlo-based methods or a computation formula:
Set the number of samples to use for Monte Carlo-based methods:
The Monte Carlo estimate converges to the true -value with increasing samples:
Set the random seed used in Monte Carlo-based methods:
The seed affects the state of the generator and has some effect on the resulting -value:
 Applications   (2)
A power curve for the Pearson test:
Visualize the approximate power curve:
Estimate the power of the Pearson test when the underlying distribution is UniformDistribution, the test size is 0.05, and the sample size is 12:
The number of auto accidents was recorded for a city over the course of 30 days. The city council is planning on lowering speed limits in the city and wants a model of the accident rate as a baseline for later comparison:
Count data is often modeled well by PoissonDistribution:
Suppose the city collected data over another 30-day period after reducing the speed limit. Compare the distributions before and after the reduction:
The distributions are significantly different:
By default, univariate data is compared to NormalDistribution:
The parameters have been estimated from the data:
Multivariate data is compared to MultinormalDistribution by default:
The parameters of the test distribution are estimated from the data if not specified:
Specified parameters are not estimated:
Maximum likelihood estimates are used for unspecified parameters of the test distribution:
PearsonChiSquareTest effectively compares the observed and expected histograms:
The data is binned into approximately bins that are equiprobable under :
Under , each bin will contain an equal number of points:
Observed histograms for when is true and false, respectively:
The degrees of freedom are equal to the number of non-empty bins minus one:
One degree of freedom is removed for each parameter that is estimated from the data:
If the parameters are unknown, PearsonChiSquareTest corrects the degrees of freedom:
No correction is applied when the parameters are specified:
The fitted distribution is equivalent but the degrees of freedom and -value are corrected:
The Pearson statistic asymptotically follows ChiSquareDistribution under :
Independent marginal densities are assumed in tests for multivariate goodness of fit:
The test statistic is identical when independence is assumed:
The distribution of the Pearson test statistic:
New in 8