AndersonDarlingTest

AndersonDarlingTest[data]

tests whether data is normally distributed using the AndersonDarling test.

AndersonDarlingTest[data,dist]

tests whether data is distributed according to dist using the AndersonDarling test.

AndersonDarlingTest[data,dist,"property"]

returns the value of "property".

Details and Options

  • AndersonDarlingTest performs the AndersonDarling goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist, and alternative hypothesis that it was not.
  • By default, a probability value or -value is returned.
  • A small -value suggests that it is unlikely that the data came from dist.
  • The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
  • The data can be univariate {x1,x2,} or multivariate {{x1,y1,},{x2,y2,},}.
  • The AndersonDarling test assumes that the data came from a continuous distribution.
  • The AndersonDarling test effectively uses a test statistic based on Expectation[((F^^(x)-F(x))^2)/(F(x) (1-F(x))),...] where is the empirical CDF of data and is the CDF of dist.
  • For univariate data, the test statistic is given by , where is the sorted data.
  • For multivariate tests, the sum of the univariate marginal -values is used and is assumed to follow a UniformSumDistribution under .
  • AndersonDarlingTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
  • AndersonDarlingTest[data,dist,"property"] can be used to directly give the value of "property".
  • Properties related to the reporting of test results include:
  • "PValue"-value
    "PValueTable"formatted version of "PValue"
    "ShortTestConclusion"a short description of the conclusion of a test
    "TestConclusion"a description of the conclusion of a test
    "TestData"test statistic and -value
    "TestDataTable"formatted version of "TestData"
    "TestStatistic"test statistic
    "TestStatisticTable"formatted "TestStatistic"
  • The following properties are independent of which test is being performed.
  • Properties related to the data distribution include:
  • "FittedDistribution"fitted distribution of data
    "FittedDistributionParameters"distribution parameters of data
  • The following options can be given:
  • MethodAutomaticthe method to use for computing -values
    SignificanceLevel0.05cutoff for diagnostics and reporting
  • For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
  • With the setting Method->"MonteCarlo", datasets of the same length as the input are generated under using the fitted distribution. The EmpiricalDistribution from AndersonDarlingTest[si,dist,"TestStatistic"] is then used to estimate the -value.

Examples

open allclose all

Basic Examples  (3)

Perform an AndersonDarling test for normality:

Test the fit of some data to a particular distribution:

Compare the distributions of two datasets:

Scope  (9)

Testing  (6)

Perform an AndersonDarling test for normality:

The -value for the normal data is large compared to the -value for the non-normal data:

Test the goodness of fit for a particular distribution:

Compare the distributions of two datasets:

Test for multivariate normality:

Test for goodness of fit to any multivariate distribution:

Create a HypothesisTestData object for repeated property extraction:

The properties available for extraction:

Reporting  (3)

Tabulate the results of the AndersonDarling test:

The full test table:

A -value table:

The test statistic:

Retrieve the entries from an AndersonDarling test table for custom reporting:

Report test conclusions using "ShortTestConclusion" and "TestConclusion":

The conclusion may differ at a different significance level:

Options  (4)

Method  (3)

Use Monte Carlo-based methods for a computation formula:

Set the number of samples to use for Monte Carlo-based methods:

The Monte Carlo estimate converges to the true -value with increasing samples:

Set the random seed used in Monte Carlo-based methods:

The seed affects the state of the generator and has some effect on the resulting -value:

SignificanceLevel  (1)

Set the significance level used for "TestConclusion" and "ShortTestConclusion":

By default, is used:

Applications  (3)

It can be shown that a GammaDistribution[1,1/λ] is equivalent to an ExponentialDistribution[λ]. This conclusion is supported by simulation:

Perform the AndersonDarling test, grouping each dataset with its expected value:

The resulting -value distributions are approximately uniform, supporting the claim:

A power curve for the AndersonDarling test:

Visualize the approximate power curve:

Estimate the power of the AndersonDarling test when the underlying distribution is a UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 6:

A collection of measurements were taken on 50 members from each of three iris species. It has been observed that the species setosa is easy to identify but that the remaining two species, versicolor and virginica, are often confused:

The distributions of petal lengths for each species:

The distributions are equivalent for versicolor and virginica, which are very different from setosa:

Assume the following petal length measures are known for the populations:

The normal mixture appears to fit the petal length distribution well:

Properties & Relations  (9)

By default, univariate data is compared to a NormalDistribution:

The parameters have been estimated from the data:

Multivariate data is compared to a MultinormalDistribution by default:

The parameters of the test distribution are estimated from the data if not specified:

Specified parameters are not estimated:

Maximum likelihood estimates are used for unspecified parameters of the test distribution:

If the parameters are unknown, AndersonDarlingTest applies a correction when possible:

The parameters are estimated but no correction is applied:

The fitted distribution is the same as before and the -value is corrected:

Independent marginal densities are assumed in tests for multivariate goodness of fit:

The test statistic is identical when independence is assumed:

The AndersonDarling test statistic:

The AndersonDarling statistic can be defined using NExpectation:

The AndersonDarling test works on the values only when the input is a TimeSeries:

Possible Issues  (2)

The AndersonDarling test is not intended for discrete distributions:

The continuity correction typically does a good job of preserving the size of the test:

This may not be the case in some situations:

Use Monte Carlo methods or PearsonChiSquareTest in these cases:

The AndersonDarling test is not valid for some distributions when parameters have been estimated from the data:

Provide parameter values if they are known:

Alternatively, use Monte Carlo methods to approximate the -value:

Neat Examples  (1)

Compute the statistic when the null hypothesis is true:

The test statistic given a particular alternative:

Compare the distributions of the test statistics:

Introduced in 2010
 (8.0)