DistributionFitTest

DistributionFitTest[data]

tests whether data is normally distributed.

DistributionFitTest[data,dist]

tests whether data is distributed according to dist.

DistributionFitTest[data,dist,"property"]

returns the value of "property".

Details and Options

  • DistributionFitTest performs a goodness-of-fit hypothesis test with null hypothesis that data was drawn from a population with distribution dist and alternative hypothesis that it was not.
  • By default, a probability value or -value is returned.
  • A small -value suggests that it is unlikely that the data came from dist.
  • The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
  • The data can be univariate {x1,x2,} or multivariate {{x1,y1,},{x2,y2,},}.
  • DistributionFitTest[data,dist,Automatic] will choose the most powerful test that applies to data and dist for a general alternative hypothesis.
  • DistributionFitTest[data,dist,All] will choose all tests that apply to data and dist.
  • DistributionFitTest[data,dist,"test"] reports the -value according to "test".
  • Many of the tests use the CDF of the test distribution dist and the empirical CDF of the data as well as their difference and =Expectation[d(x),]. The CDFs and should be the same under the null hypothesis .
  • The following tests can be used for univariate or multivariate distributions:
  • "AndersonDarling"distribution, databased on Expectation[]
    "CramerVonMises"distribution, databased on Expectation[d(x)2]
    "JarqueBeraALM"normalitybased on skewness and kurtosis
    "KolmogorovSmirnov"distribution, databased on sup_x TemplateBox[{{d, (, x, )}}, Abs]
    "Kuiper"distribution, databased on
    "PearsonChiSquare"distribution, databased on expected and observed histogram
    "ShapiroWilk"normalitybased on quantiles
    "WatsonUSquare"distribution, databased on Expectation[(d(x))2]
  • The following tests can be used for multivariate distributions:
  • "BaringhausHenze"normalitybased on empirical characteristic function
    "DistanceToBoundary"uniformitybased on distance to uniform boundaries
    "MardiaCombined"normalitycombined Mardia skewness and kurtosis
    "MardiaKurtosis"normalitybased on multivariate kurtosis
    "MardiaSkewness"normalitybased on multivariate skewness
    "SzekelyEnergy"databased on Newton's potential energy
  • DistributionFitTest[data,dist,"property"] can be used to directly give the value of "property".
  • Properties related to the reporting of test results include:
  • "AllTests"list of all applicable tests
    "AutomaticTest"test chosen if Automatic is used
    "DegreesOfFreedom"the degrees of freedom used in a test
    "PValue"list of -values
    "PValueTable"formatted table of -values
    "ShortTestConclusion"a short description of the conclusion of a test
    "TestConclusion"a description of the conclusion of a test
    "TestData"list of pairs of test statistics and -values
    "TestDataTable"formatted table of -values and test statistics
    "TestStatistic"list of test statistics
    "TestStatisticTable"formatted table of test statistics
    "HypothesisTestData"returns a HypothesisTestData object
  • DistributionFitTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
  • Properties related to the data distribution include:
  • "FittedDistribution"fitted distribution of data
    "FittedDistributionParameters"distribution parameters of data
  • The following options can be given:
  • MethodAutomaticthe method to use for computing -values
    SignificanceLevel0.05cutoff for diagnostics and reporting
  • For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
  • With the setting Method->"MonteCarlo", datasets of the same length as the input si are generated under using the fitted distribution. The EmpiricalDistribution from DistributionFitTest[si,dist,{"TestStatistic",test}] is then used to estimate the -value.

Examples

open allclose all

Basic Examples  (3)

Test some data for normality:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

Create a HypothesisTestData object for further property extraction:

In[3]:=
Click for copyable input

The full test table:

In[4]:=
Click for copyable input
Out[4]=

Compare the histogram of the data to the PDF of the test distribution:

In[5]:=
Click for copyable input
Out[5]=

Test the fit of a set of data to a particular distribution:

In[1]:=
Click for copyable input

Extract the AndersonDarling test table:

In[2]:=
Click for copyable input
Out[2]=

Verify the test results with ProbabilityPlot:

In[3]:=
Click for copyable input
Out[3]=

Test data for goodness of fit to a multivariate distribution:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
In[3]:=
Click for copyable input
Out[3]=

Plot the marginal PDFs of the test distribution against the data to confirm the test results:

In[4]:=
Click for copyable input
Out[4]=

Scope  (21)

Options  (6)

Applications  (12)

Properties & Relations  (16)

Possible Issues  (5)

Neat Examples  (1)

See Also

EstimatedDistribution  FindDistributionParameters  HypothesisTestData  LocationTest  VarianceTest  IndependenceTest  LogRankTest  AndersonDarlingTest  KolmogorovSmirnovTest  CramerVonMisesTest  JarqueBeraALMTest  KuiperTest  MardiaCombinedTest  MardiaKurtosisTest  MardiaSkewnessTest  BaringhausHenzeTest  PearsonChiSquareTest  ShapiroWilkTest  WatsonUSquareTest

Introduced in 2010
(8.0)
| Updated in 2015
(10.2)