How to |
Perform a Bootstrap Analysis
Suppose that you have a limited amount of data from which to obtain estimates of statistics for a population. The sampling distribution for those estimates can be approximated by drawing new samples from the original data and then computing statistics from each sample. This process is called bootstrapping and can be performed in Mathematica
You can sample with replacement by using RandomChoice
. By including Length
as the second argument in RandomChoice
, a new sample of the same size as the original is generated:
Assuming the original dataset is representative of the larger population it came from, then the resampled values should behave like a sample from the original population. Statistics of a sample from the original dataset should thus simulate sample statistics for the population.
to compute the skewness of the original data:
Compute the skewness of the resampled data:
By resampling from the original dataset several times and computing the skewness for each of these samples, you can approximate the sampling distribution for the skewness.
is used to iteratively compute the skewness values for 1000 resampled datasets:
You can use Histogram
to visualize the sampling distribution for the skewness values of the 1000 resampled datasets:
The list of resampled values provides a sample space for the estimate, which is skewness in this case, so you can also compute additional statistics.
Estimate the standard error in the skewness estimate. The standard deviation of the estimates is the standard error:
to obtain a 95% confidence interval for the sample skewness:
More complicated results can also be bootstrapped, such as parameter estimates in a maximum likelihood fitting.
Generate some points from a gamma distribution and display a histogram of the data:
It is often useful to obtain parameter estimates from a dataset, under the assumption that it follows a given distribution. For example, you may wish to use maximum likelihood to estimate the
parameters for the data generated from the gamma distribution.
You can obtain the log-likelihood function for the entire dataset by using FindDistributionParameters
You can now bootstrap the statistics for the parameter estimates by replacing the data with a resampling of values from the original dataset. This can be done using RandomChoice
to get the resampled dataset:
Define this computation as the function
You can evaluate
many times to generate numerous estimates for
, thus giving a sample space for the parameters.
to generate 100 estimates for
To get a bootstrapped estimate of the parameter values, take the mean of the estimates:
The correlation matrix can be computed for the parameter estimates to assess the relationship between the