How to | Perform a Bootstrap Analysis

Suppose that you have a limited amount of data from which to obtain estimates of statistics for a population. The sampling distribution for those estimates can be approximated by drawing new samples from the original data and then computing statistics from each sample. This process is called bootstrapping and can be performed in the Wolfram Language with RandomChoice.

Begin with a dataset:

You can sample with replacement by using RandomChoice. By including Length[data] as the second argument in RandomChoice, a new sample of the same size as the original is generated:

Assuming the original dataset is representative of the larger population it came from, then the resampled values should behave like a sample from the original population. Statistics of a sample from the original dataset should thus simulate sample statistics for the population.

Use Skewness to compute the skewness of the original data:

Compute the skewness of the resampled data:

By resampling from the original dataset several times and computing the skewness for each of these samples, you can approximate the sampling distribution for the skewness.

Here, Table is used to iteratively compute the skewness values for 1000 resampled datasets:

You can use Histogram to visualize the sampling distribution for the skewness values of the 1000 resampled datasets:

The list of resampled values provides a sample space for the estimate, which is skewness in this case, so you can also compute additional statistics.

Estimate the standard error in the skewness estimate. The standard deviation of the estimates is the standard error:

Use Quantile to obtain a 95% confidence interval for the sample skewness:

    

More complicated results can also be bootstrapped, such as parameter estimates in a maximum likelihood fitting.

Generate some points from a gamma distribution and display a histogram of the data:

It is often useful to obtain parameter estimates from a dataset, under the assumption that it follows a given distribution. For example, you may wish to use maximum likelihood to estimate the α and β parameters for the data generated from the gamma distribution.

You can obtain the log-likelihood function for the entire dataset by using FindDistributionParameters:

You can now bootstrap the statistics for the parameter estimates by replacing the data with a resampling of values from the original dataset. This can be done using RandomChoice to get the resampled dataset:

Define this computation as the function params:

You can evaluate params many times to generate numerous estimates for α and β, thus giving a sample space for the parameters.

Use Table with params to generate 100 estimates for α and β:

To get a bootstrapped estimate of the parameter values, take the mean of the estimates:

The correlation matrix can be computed for the parameter estimates to assess the relationship between the α and β estimates: