PearsonChiSquareTest

PearsonChiSquareTest[data]

tests whether data is normally distributed using the Pearson test.

PearsonChiSquareTest[data,dist]

tests whether data is distributed according to dist using the Pearson test.

PearsonChiSquareTest[data,dist,"property"]

returns the value of "property".

Details and Options

PearsonChiSquareTest performs the Pearson goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist, and alternative hypothesis that it was not.
By default, a probability value or -value is returned.
A small -value suggests that it is unlikely that the data came from dist.
The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
The data can be univariate {x₁,x₂,…} or multivariate {{x₁,y₁,…},{x₂,y₂,…},…}.
The Pearson test effectively compares a histogram of data to a theoretical histogram based on dist. The bins are chosen to have equal probability in dist. »
For univariate data, the test statistic is given by , where and are the observed and expected counts for the histogram bin, respectively.
For multivariate tests, the sum of the univariate marginal -values is used and is assumed to follow a UniformSumDistribution under .
PearsonChiSquareTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
PearsonChiSquareTest[data,dist,"property"] can be used to directly give the value of "property".
Properties related to the reporting of test results include:

	"DegreesOfFreedom"	the degrees of freedom used in a test
	"PValue"	-value
	"PValueTable"	formatted version of "PValue"
	"ShortTestConclusion"	a short description of the conclusion of a test
	"TestConclusion"	a description of the conclusion of a test
	"TestData"	test statistic and -value
	"TestDataTable"	formatted version of "TestData"
	"TestStatistic"	test statistic
	"TestStatisticTable"	formatted "TestStatistic"

The following properties are independent of which test is being performed.
Properties related to the data distribution include:
"FittedDistribution" fitted distribution of data

"FittedDistributionParameters" distribution parameters of data
The following options can be given:
Method Automatic the method to use for computing -values

SignificanceLevel 0.05 cutoff for diagnostics and reporting
For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
With the setting Method->"MonteCarlo", datasets of the same length as the input are generated under using the fitted distribution. The EmpiricalDistribution from PearsonChiSquareTest[s_i,dist,"TestStatistic"] is then used to estimate the -value.

Examples

open allclose all

Basic Examples (4)

Perform the Pearson test for normality:

Test the fit of some data to a particular distribution:

Compare the distributions of two datasets:

Extract the test statistic from the Pearson test:

Scope (9)

Testing (6)

Perform a Pearson test for normality:

The -value for the normal data is large compared to the -value for the non-normal data:

Test the goodness of fit to a particular distribution:

Compare the distributions of two datasets:

The two datasets do not have the same distribution:

Test for multivariate normality:

Test for goodness of fit to any multivariate distribution:

Create a HypothesisTestData object for repeated property extraction:

The properties available for extraction:

Reporting (3)

Tabulate the results of the Pearson test:

The full test table:

A -value table:

The test statistic:

Retrieve the entries from a Pearson test table for custom reporting:

Report test conclusions using "ShortTestConclusion" and "TestConclusion":

The conclusion may differ at a different significance level:

Options (3)

Method (3)

Use Monte Carlo-based methods or a computation formula:

Set the number of samples to use for Monte Carlo-based methods:

The Monte Carlo estimate converges to the true -value with increasing samples:

Set the random seed used in Monte Carlo-based methods:

The seed affects the state of the generator and has some effect on the resulting -value:

Applications (2)

A power curve for the Pearson test:

Visualize the approximate power curve:

Estimate the power of the Pearson test when the underlying distribution is UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 12:

The number of auto accidents was recorded for a city over the course of 30 days. The city council is planning on lowering speed limits in the city and wants a model of the accident rate as a baseline for later comparison:

Count data is often modeled well by PoissonDistribution:

Suppose the city collected data over another 30-day period after reducing the speed limit. Compare the distributions before and after the reduction:

The distributions are significantly different:

Properties & Relations (10)

By default, univariate data is compared to NormalDistribution:

The parameters have been estimated from the data:

Multivariate data is compared to MultinormalDistribution by default:

The parameters of the test distribution are estimated from the data if not specified:

Specified parameters are not estimated:

Maximum likelihood estimates are used for unspecified parameters of the test distribution:

PearsonChiSquareTest effectively compares the observed and expected histograms:

The data is binned into approximately bins that are equiprobable under :

Under , each bin will contain an equal number of points:

Observed histograms for when is true and false, respectively:

The degrees of freedom are equal to the number of non-empty bins minus one:

One degree of freedom is removed for each parameter that is estimated from the data:

If the parameters are unknown, PearsonChiSquareTest corrects the degrees of freedom:

No correction is applied when the parameters are specified:

The fitted distribution is equivalent, but the degrees of freedom and -value are corrected:

The Pearson statistic asymptotically follows ChiSquareDistribution under :

Independent marginal densities are assumed in tests for multivariate goodness of fit:

The test statistic is identical when independence is assumed:

The Pearson test works with the values only when the input is a TimeSeries:

Neat Examples (1)

Compute the statistic when the null hypothesis is true:

The test statistic given a particular alternative:

Compare the distributions of the test statistics:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

PearsonChiSquareTest

Details and Options

Examples

Basic Examples (4)