How to | Perform a Bootstrap Analysis

Suppose that you have a limited amount of data from which to obtain estimates of statistics for a population. The sampling distribution for those estimates can be approximated by drawing new samples from the original data and then computing statistics from each sample. This process is called bootstrapping and can be performed in Mathematica with RandomChoice.

Begin with a dataset:

In[12]:=
Click for copyable input
Out[12]=

You can sample with replacement by using RandomChoice. By including Length as the second argument in RandomChoice, a new sample of the same size as the original is generated:

In[13]:=
Click for copyable input
Out[13]=

Assuming the original dataset is representative of the larger population it came from, then the resampled values should behave like a sample from the original population. Statistics of a sample from the original dataset should thus simulate sample statistics for the population.

Use Skewness to compute the skewness of the original data:

In[14]:=
Click for copyable input
Out[14]=

Compute the skewness of the resampled data:

In[15]:=
Click for copyable input
Out[15]=

By resampling from the original dataset several times and computing the skewness for each of these samples, you can approximate the sampling distribution for the skewness.

Here, Table is used to iteratively compute the skewness values for 1000 resampled datasets:

In[16]:=
Click for copyable input

You can use Histogram to visualize the sampling distribution for the skewness values of the 1000 resampled datasets:

In[17]:=
Click for copyable input
Out[17]=

The list of resampled values provides a sample space for the estimate, which is skewness in this case, so you can also compute additional statistics.

Estimate the standard error in the skewness estimate. The standard deviation of the estimates is the standard error:

In[18]:=
Click for copyable input
Out[18]=

Use Quantile to obtain a 95% confidence interval for the sample skewness:

In[19]:=
Click for copyable input
Out[19]=
    

More complicated results can also be bootstrapped, such as parameter estimates in a maximum likelihood fitting.

Generate some points from a gamma distribution and display a histogram of the data:

In[20]:=
Click for copyable input
In[21]:=
Click for copyable input
Out[21]=

It is often useful to obtain parameter estimates from a dataset, under the assumption that it follows a given distribution. For example, you may wish to use maximum likelihood to estimate the and parameters for the data generated from the gamma distribution.

You can obtain the log-likelihood function for the entire dataset by using FindDistributionParameters:

In[22]:=
Click for copyable input
Out[22]=

You can now bootstrap the statistics for the parameter estimates by replacing the data with a resampling of values from the original dataset. This can be done using RandomChoice to get the resampled dataset:

In[23]:=
Click for copyable input
Out[23]=

Define this computation as the function :

In[24]:=
Click for copyable input

You can evaluate many times to generate numerous estimates for and , thus giving a sample space for the parameters.

Use Table with to generate 100 estimates for and :

In[25]:=
Click for copyable input

To get a bootstrapped estimate of the parameter values, take the mean of the estimates:

In[26]:=
Click for copyable input
Out[26]=

The correlation matrix can be computed for the parameter estimates to assess the relationship between the and estimates:

In[27]:=
Click for copyable input
Out[27]//MatrixForm=
New to Mathematica? Find your learning path »
Have a question? Ask support »