# KolmogorovSmirnovTest

KolmogorovSmirnovTest[data]

tests whether data is normally distributed using the KolmogorovSmirnov test.

KolmogorovSmirnovTest[data,dist]

tests whether data is distributed according to dist using the KolmogorovSmirnov test.

KolmogorovSmirnovTest[data,dist,"property"]

returns the value of "property".

# Details and Options   • KolmogorovSmirnovTest performs the KolmogorovSmirnov goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist and alternative hypothesis that it was not.
• By default, a probability value or -value is returned.
• A small -value suggests that it is unlikely that the data came from dist.
• The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
• The data can be univariate {x1,x2,} or multivariate {{x1,y1,},{x2,y2,},}.
• The KolmogorovSmirnov test assumes that the data came from a continuous distribution.
• The KolmogorovSmirnov test effectively uses a test statistic based on where is the empirical CDF of data and is the CDF of dist.
• For multivariate tests, the sum of the univariate marginal -values is used and is assumed to follow a UniformSumDistribution under .
• KolmogorovSmirnovTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
• KolmogorovSmirnovTest[data,dist,"property"] can be used to directly give the value of "property".
• Properties related to the reporting of test results include:
•  "PValue" -value "PValueTable" formatted version of "PValue" "ShortTestConclusion" a short description of the conclusion of a test "TestConclusion" a description of the conclusion of a test "TestData" test statistic and -value "TestDataTable" formatted version of "TestData" "TestStatistic" test statistic "TestStatisticTable" formatted "TestStatistic"
• The following properties are independent of which test is being performed.
• Properties related to the data distribution include:
•  "FittedDistribution" fitted distribution of data "FittedDistributionParameters" distribution parameters of data
• The following options can be given:
•  Method Automatic the method to use for computing -values SignificanceLevel 0.05 cutoff for diagnostics and reporting
• For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
• With the setting Method->"MonteCarlo", datasets of the same length as the input are generated under using the fitted distribution. The EmpiricalDistribution from KolmogorovSmirnovTest[si,dist,"TestStatistic"] is then used to estimate the -value.

# Examples

open allclose all

## Basic Examples(3)

Perform a KolmogorovSmirnov test for normality:

Test the fit of some data to a particular distribution:

Compare the distributions of two datasets:

There is not a sufficient evidence that data may be samples from different distributions:

## Scope(9)

### Testing(6)

Perform a KolmogorovSmirnov test for normality:

The -value for the normal data is large compared to the -value for the non-normal data:

Test the goodness of fit to a particular distribution:

Compare the distributions of two datasets:

The two datasets do not have the same distribution:

Test for multivariate normality:

Test for goodness of fit to any multivariate distribution:

Create a HypothesisTestData object for repeated property extraction:

The properties available for extraction:

### Reporting(3)

Tabulate the results of the KolmogorovSmirnov test:

The full test table:

A -value table:

The test statistic:

Retrieve the entries from a KolmogorovSmirnov test table for custom reporting:

Report test conclusions using "ShortTestConclusion" and "TestConclusion":

The conclusion may differ at a different significance level:

## Options(4)

### Method(3)

Use Monte Carlo-based methods for a computation formula:

Set the number of samples to use for Monte Carlo-based methods:

The Monte Carlo estimate converges to the true -value with increasing samples:

Set the random seed used in Monte Carlo-based methods:

The seed affects the state of the generator and has some effect on the resulting -value:

### SignificanceLevel(1)

Set the significance level used for "TestConclusion" and "ShortTestConclusion":

By default, is used:

## Applications(2)

A power curve for the KolmogorovSmirnov test:

Visualize the approximate power curve:

Estimate the power of the KolmogorovSmirnov test when the underlying distribution is a UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 12:

A sample of 31 sheets of airplane glass were subjected to a constant stress until breakage. Investigate whether the data is drawn from a NormalDistribution or a GammaDistribution:

Compare the quantile-quantile plots for the candidate distributions:

The data appears to fit a GammaDistribution slightly better than a NormalDistribution:

## Properties & Relations(9)

By default, univariate data is compared to a NormalDistribution:

The parameters have been estimated from the data:

Multivariate data is compared to a MultinormalDistribution by default:

The parameters of the test distribution are estimated from the data if not specified:

Specified parameters are not estimated:

Maximum-likelihood estimates are used for unspecified parameters of the test distribution:

If the parameters are unknown, KolmogorovSmirnovTest applies a correction when possible:

The parameters are estimated but no correction is applied:

The fitted distribution is the same as before and the -value is corrected:

When parameters are estimated, Lilliefors' correction is used:

Estimate the parameters prior to testing to perform the classical KolmogorovSmirnov test:

Conceptually, the KolmogorovSmirnov test computes the maximum absolute difference between the empirical and theoretical CDFs:

Plot the CDFs, showing the maximum absolute difference:

Independent marginal densities are assumed in tests for multivariate goodness of fit:

The test statistic is identical when independence is assumed:

The KolmogorovSmirnov test works with the values only when the input is a TimeSeries:

## Possible Issues(3)

The KolmogorovSmirnov test is not intended for discrete distributions:  The test tends to be conservative:

Use Monte Carlo methods or PearsonChiSquareTest in these cases:  The KolmogorovSmirnov test is not valid for some distributions when parameters have been estimated from the data: Provide parameter values if they are known:

Alternatively, use Monte Carlo methods to approximate the -value:

Ties in the data are ignored: Differences may be more apparent with larger numbers of ties: ## Neat Examples(1)

Compute the statistic when the null hypothesis is true:

The test statistic given a particular alternative:

Compare the distributions of the test statistics: