This is documentation for Mathematica 8, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.1)

SmoothKernelDistribution

SmoothKernelDistribution
represents a smooth kernel distribution based on the data values .
SmoothKernelDistribution
represents a multivariate smooth kernel distribution based on the data values .
SmoothKernelDistribution
represents a smooth kernel distribution with bandwidth bw.
SmoothKernelDistribution
represents a smooth kernel distribution with bandwidth bw and smoothing kernel ker.
  • The probability density function for SmoothKernelDistribution for a value is given by a linearly interpolated version of for a smoothing kernel and bandwidth parameter .
  • The following bandwidth specifications bw can be given:
hbandwidth to use  »
{"Standardized", h}bandwidth in units of standard deviations  »
{"Adaptive",h, s}adaptive with initial bandwidth h and sensitivity  »
Automaticautomatically computed bandwidth
"name"use a named bandwidth selection method  »
{bwx,bwy,...}separate bandwidth specifications for x, y, etc.
  • For multivariate densities, h can be a positive definite symmetric matrix.
  • For adaptive bandwidths, the sensitivity must be a real number between 0 and 1 or Automatic. If Automatic is used, is set to , where is the dimensionality of the data.
  • Possible named bandwidth selection methods include:
"LeastSquaresCrossValidation"use the method of least-squares cross-validation
"Oversmooth"1.08 times wider than the standard Gaussian
"Scott"use Scott's rule to determine bandwidth
"SheatherJones"use the Sheather-Jones plugin estimator
"Silverman"use Silverman's rule to determine bandwidth
"StandardDeviation"use the standard deviation as bandwidth
"StandardGaussian"optimal bandwidth for standard normal data
  • By default the method is used.
  • The following kernel specifications ker can be given: »
"Biweight"
"Cosine"
"Epanechnikov"
"Gaussian"
"Rectangular"
"SemiCircle"
"Triangular"
"Triweight"
funcf_nu∈R
  • In order for SmoothKernelDistribution to generate a true density estimate, the function fn should be a valid probability density function.
  • By default the kernel is used.
  • For multivariate densities, the kernel function ker can be specified as product and radial types using and respectively. Product-type kernels are used if no type is specified.
  • The precision used for density estimation is the minimum precision given in the bw and data.
  • When adaptive bandwidths are used in multivariate estimation the maximum sensitivity given is used in all dimensions.
  • The following options can be given:
InterpolationPointsAutomaticinitial number of interpolation points to use
MaxMixtureKernelsAutomaticmax number of kernels to use
MaxRecursionAutomaticnumber of recursive subdivisions to allow
PerformanceGoal"Speed"optimize for speed or quality
MaxExtraBandwidthsAutomaticmax bandwidths beyond data to use
Create an interpolated version of a kernel density estimate for some univariate data:
Use the resulting distribution to perform analysis, including visualizing distribution functions:
Compute moments and quantiles:
Create an interpolated version of a kernel density estimate of some bivariate data:
Visualize the estimated PDF and CDF:
Compute covariance and general moments:
Create an interpolated version of a kernel density estimate for some univariate data:
In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Use the resulting distribution to perform analysis, including visualizing distribution functions:
In[3]:=
Click for copyable input
Out[3]=
Compute moments and quantiles:
In[4]:=
Click for copyable input
Out[4]=
In[5]:=
Click for copyable input
Out[5]=
 
Create an interpolated version of a kernel density estimate of some bivariate data:
In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Visualize the estimated PDF and CDF:
In[3]:=
Click for copyable input
Out[3]=
Compute covariance and general moments:
In[4]:=
Click for copyable input
Out[4]//MatrixForm=
In[5]:=
Click for copyable input
Out[5]=
Create an interpolated smooth density estimate for some data:
Compute probabilities from the distribution:
Increase the bandwidth for smoother estimates:
Allow the bandwidth to vary adaptively with local density:
Interpolate kernel density estimates in higher dimensions:
Plot the univariate marginal PDFs:
Plot the bivariate marginal PDFs:
Select from built-in kernel functions or build a custom one:
A custom kernel function:
Specify radial or product type kernels for multivariate estimates:
Estimate distribution functions:
Compute moments of the distribution:
Special moments:
General moments:
Quantile function:
Special quantile values:
Generate random numbers:
Compute probabilities and expectations:
Estimate bivariate distribution functions:
Compute moments of a bivariate distribution:
Special moments:
General moments:
Generate random numbers:
Show the point distribution:
Automatically select the bandwidth to use:
More data yields better approximations to the underlying distribution:
Explicitly specify the bandwidth to use:
Use bandwidths of and :
Larger bandwidths yield smoother estimates:
Specify bandwidths in units of standard deviation:
Use bandwidths of and the standard deviation:
Allow the bandwidth to vary adaptively with local density:
Vary the local sensitivity from (none) to (full):
Vary the initial bandwidth for an adaptive estimate:
Specify an initial bandwidth of and , respectively:
Use any of several automatic bandwidth selection methods:
Silverman's method is used by default:
The PDFs are equivalent:
By default, Silverman's method is used to independently select bandwidths in each dimension:
Any automated method can be used to independently select diagonal bandwidth elements:
Methods used to estimate the bandwidth diagonal need not be the same:
Use adaptive, oversmoothed, and constant bandwidths in the respective dimensions:
Plot the univariate marginal PDFs:
Give a scalar value to use the same bandwidth in all dimensions:
To use nonzero off-diagonal elements, give a fully specified bandwidth matrix:
Specify any one of several kernel functions:
Define the kernel function as a pure function:
By default, the Gaussian kernel is used:
This is equivalent to using the PDF of a NormalDistribution:
Shapes of some univariate kernel functions:
Specify any one of several kernel functions for multivariate data:
Choose between product and radial-type kernel functions for multivariate data:
By default, nonuniform interpolation is used to create a smooth estimate:
Specify the initial number of sample points to use:
Use 2 interpolation points:
A larger number of points yields a smoother estimate:
Specify the number of interpolating points to use for bivariate data:
Use 3 and 30 interpolation points in each dimension:
Use different numbers of interpolation points in each dimension:
Specify 3 and 30 points or 30 and 3:
A smooth result does not imply a high-quality estimate:
Using 1000 interpolation points creates a very smooth estimate in this case:
By default the estimate extends at most 12 bandwidths beyond the data:
Set the maximum number of bandwidths to use:
Use 0 and 12 bandwidths, respectively:
Set a different number for each endpoint:
Specify the number of extra bandwidths to use for multivariate data:
Use 0 and 12 bandwidths, respectively:
Specify the number of extra bandwidths to use in each dimension:
Use 0 and 12 bandwidths or 12 and 0 bandwidths, respectively:
Set a different number for each endpoint in each dimension:
By default the number of kernels is generally optimal:
Specify the maximum number of kernels to use in the estimate:
Place at most 5 kernels:
A larger number of kernels gives a better estimate of the underlying distribution:
Place a kernel at each data point:
Vary the bandwidth used for the same number of kernels:
Specify the maximum number of kernels to use in each dimension for bivariate data:
Place at most 10 and 100 kernels, respectively:
Set the maximum number of kernels in each dimension:
Specify a maximum of 5 and 50 kernels or 50 and 5:
A smooth estimate will usually be returned by default:
Specify the maximum number of recursive subdivisions to use:
Vary the amount of recursive subdivisions while with 3 InterpolationPoints:
Give the maximum number of recursive subdivisions for bivariate data:
Use at most 2 and 6 subdivisions, respectively:
Set the maximum number of recursive subdivisions in each dimension:
Specify a maximum of 0 and 3 subdivisions or 3 and 0:
By default, estimates are optimized for a balance between speed and quality:
Set PerformanceGoal for speed or quality or use Automatic to balance the two:
More time is spent with PerformanceGoal set to :
Use with ControlActive to vary PerformanceGoal dynamically:
Compare an estimated density to a theoretical model:
Use adaptive bandwidths for highly oscillatory densities:
The moments of the model and the estimate are similar:
Use TruncatedDistribution to restrict the domain after smoothing:
The estimate is restricted to positive values:
Verify that the distribution is bound by the truncation region:
Use with Cases to restrict the data domain before smoothing:
The estimate goes beyond the data on the left but the data is restricted to positive values:
The probability that the data falls below zero is not zero:
Use MaxExtraBandwidths to restrict the domain without dropping data:
The estimate stops at the minimum data value, which is restricted to positive values:
Estimate the distribution of the lengths of human chromosomes:
The expected chromosome length, given that the length is greater than the mean:
Smooth the discrete distribution of the differences of successive primes:
Investigate the distribution of differenced daily returns on the S&P 500 during the 1990s:
Compare the smoothed distribution to a fitted model:
Compare the distribution of salaries from two university departments:
Estimate the joint distribution of Old Faithful eruption durations and waiting times:
Probability an eruption lasts more than two minutes and the waiting time is less than one hour:
Smooth a histogram:
Generate random numbers from the histogram for smoothing:
Smooth an estimate returned from SurvivalDistribution:
Compute the probability of survival beyond 25 given that the survival time is greater than 10:
Create a confidence band for the PDF of snowfall accumulations in Buffalo, New York:
Smooth over each bootstrapped sample and obtain the confidence estimates:
Visualize the estimate of the PDF with the 95% confidence band:
Confirm that the Mahalanobis distance has an asymptotic ChiSquareDistribution[p] given p-dimensional multivariate normal data:
The probability that the Mahalanobis distance will exceed 10, given four-dimensional normal data:
Estimate a heavy-tailed density using parametric tail models:
The body is estimated well but the tails are undersmoothed due to lack of data:
Create a mixture of the kernel density estimate and estimated tail models:
The entire estimate is smooth:
The resulting density estimate integrates to unity:
By default machine estimates are used:
Use high-precision data to get high-precision estimates:
The PDF is piecewise linear:
The CDF and SurvivalFunction are piecewise quadratic:
The HazardFunction is piecewise rational with linear over quadratic:
SmoothKernelDistribution is a consistent estimator of the underlying distribution:
As the bandwidth approaches infinity, the estimate approaches the shape of the kernel:
The kernel function needs to be a PDF:
The resulting density estimate is not a PDF:
Automatic adaptive bandwidths may be too small with large samples:
Try increasing the initial bandwidth, MaxMixtureKernels, or decreasing the sensitivity:
SmoothKernelDistribution does not know the domain of the underlying distribution:
The estimated PDF is continuous although the underlying distribution is discrete:
The estimated PDF is not bound on :
With heavily adaptive bandwidths, these issues may be less obvious:
The tails of some distributions are too heavy to estimate automatically:
In some cases it may be useful to restrict the range of the data:
Compute the distribution of temperature readings near your location:
Estimate the density of volcanic craters in western Uganda:
A region function for a bounding polygon using winding numbers:
New in 8