KolmogorovSmirnovTest
KolmogorovSmirnovTest[data]
tests whether data is normally distributed using the Kolmogorov–Smirnov test.
KolmogorovSmirnovTest[data,dist]
tests whether data is distributed according to dist using the Kolmogorov–Smirnov test.
KolmogorovSmirnovTest[data,dist,"property"]
returns the value of "property".
Details and Options
data:image/s3,"s3://crabby-images/e8140/e814045083cb0a7934d1752d066d08988ab8f1b3" alt=""
data:image/s3,"s3://crabby-images/f7c50/f7c50a54f9c675008e99889661bf5e04cfba4f43" alt=""
data:image/s3,"s3://crabby-images/c1ba9/c1ba985dc0a16ac09e712f4292e3b18dfe6451c3" alt=""
- KolmogorovSmirnovTest performs the Kolmogorov–Smirnov goodness-of-fit test with null hypothesis
that data was drawn from a population with distribution dist and alternative hypothesis
that it was not.
- By default, a probability value or
-value is returned.
- A small
-value suggests that it is unlikely that the data came from dist.
- The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
- The data can be univariate {x1,x2,…} or multivariate {{x1,y1,…},{x2,y2,…},…}.
- The Kolmogorov–Smirnov test assumes that the data came from a continuous distribution.
- The Kolmogorov–Smirnov test effectively uses a test statistic based on
where
is the empirical CDF of data and
is the CDF of dist.
- For multivariate tests, the sum of the univariate marginal
-values is used and is assumed to follow a UniformSumDistribution under
.
- KolmogorovSmirnovTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
- KolmogorovSmirnovTest[data,dist,"property"] can be used to directly give the value of "property".
- Properties related to the reporting of test results include:
-
"PValue" -value
"PValueTable" formatted version of "PValue" "ShortTestConclusion" a short description of the conclusion of a test "TestConclusion" a description of the conclusion of a test "TestData" test statistic and -value
"TestDataTable" formatted version of "TestData" "TestStatistic" test statistic "TestStatisticTable" formatted "TestStatistic" - The following properties are independent of which test is being performed.
- Properties related to the data distribution include:
-
"FittedDistribution" fitted distribution of data "FittedDistributionParameters" distribution parameters of data - The following options can be given:
-
Method Automatic the method to use for computing -values
SignificanceLevel 0.05 cutoff for diagnostics and reporting - For a test for goodness of fit, a cutoff
is chosen such that
is rejected only if
. The value of
used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default,
is set to 0.05.
- With the setting Method->"MonteCarlo",
datasets of the same length as the input
are generated under
using the fitted distribution. The EmpiricalDistribution from KolmogorovSmirnovTest[si,dist,"TestStatistic"] is then used to estimate the
-value.
Examples
open allclose allBasic Examples (3)
Scope (9)
Testing (6)
Perform a Kolmogorov–Smirnov test for normality:
The -value for the normal data is large compared to the
-value for the non-normal data:
Test the goodness of fit to a particular distribution:
Compare the distributions of two datasets:
The two datasets do not have the same distribution:
Test for multivariate normality:
Test for goodness of fit to any multivariate distribution:
Create a HypothesisTestData object for repeated property extraction:
Options (4)
Method (3)
Use Monte Carlo-based methods for a computation formula:
Set the number of samples to use for Monte Carlo-based methods:
The Monte Carlo estimate converges to the true -value with increasing samples:
Set the random seed used in Monte Carlo-based methods:
The seed affects the state of the generator and has some effect on the resulting -value:
Applications (2)
A power curve for the Kolmogorov–Smirnov test:
Visualize the approximate power curve:
Estimate the power of the Kolmogorov–Smirnov test when the underlying distribution is a UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 12:
A sample of 31 sheets of airplane glass were subjected to a constant stress until breakage. Investigate whether the data is drawn from a NormalDistribution or a GammaDistribution:
Compare the quantile-quantile plots for the candidate distributions:
The data appears to fit a GammaDistribution slightly better than a NormalDistribution:
Properties & Relations (9)
By default, univariate data is compared to a NormalDistribution:
The parameters have been estimated from the data:
Multivariate data is compared to a MultinormalDistribution by default:
The parameters of the test distribution are estimated from the data if not specified:
Specified parameters are not estimated:
Maximum-likelihood estimates are used for unspecified parameters of the test distribution:
If the parameters are unknown, KolmogorovSmirnovTest applies a correction when possible:
The parameters are estimated but no correction is applied:
The fitted distribution is the same as before and the -value is corrected:
When parameters are estimated, Lilliefors' correction is used:
Estimate the parameters prior to testing to perform the classical Kolmogorov–Smirnov test:
Conceptually, the Kolmogorov–Smirnov test computes the maximum absolute difference between the empirical and theoretical CDFs:
Plot the CDFs, showing the maximum absolute difference:
Independent marginal densities are assumed in tests for multivariate goodness of fit:
The test statistic is identical when independence is assumed:
The Kolmogorov–Smirnov test works with the values only when the input is a TimeSeries:
Possible Issues (3)
The Kolmogorov–Smirnov test is not intended for discrete distributions:
data:image/s3,"s3://crabby-images/26fcf/26fcf7f9fb408d05bd607a0a011c0ba3590ad3ce" alt=""
data:image/s3,"s3://crabby-images/3278e/3278e784c6d5a845abf143f739a833155cb7fc7e" alt=""
The test tends to be conservative:
Use Monte Carlo methods or PearsonChiSquareTest in these cases:
data:image/s3,"s3://crabby-images/4552a/4552af5734ba6ef6ee1a704381d78161faf0be96" alt=""
data:image/s3,"s3://crabby-images/ee869/ee869ba2a154a72dd0e9d7a05b205fb1fe67eaff" alt=""
The Kolmogorov–Smirnov test is not valid for some distributions when parameters have been estimated from the data:
data:image/s3,"s3://crabby-images/d0ee8/d0ee8a4f956a45824d1a13f4e30fa81c5b1158be" alt=""
Provide parameter values if they are known:
Alternatively, use Monte Carlo methods to approximate the -value:
data:image/s3,"s3://crabby-images/3eef2/3eef2d464cb24c1eb401388282ca78af8ab2e04d" alt=""
Differences may be more apparent with larger numbers of ties:
data:image/s3,"s3://crabby-images/63d65/63d65caad4a8141c18475aeb164db8ea9833f9b9" alt=""
Text
Wolfram Research (2010), KolmogorovSmirnovTest, Wolfram Language function, https://reference.wolfram.com/language/ref/KolmogorovSmirnovTest.html.
CMS
Wolfram Language. 2010. "KolmogorovSmirnovTest." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/KolmogorovSmirnovTest.html.
APA
Wolfram Language. (2010). KolmogorovSmirnovTest. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/KolmogorovSmirnovTest.html