1.5.2 The Asymptotic Distribution of the Sample Correlation Function Let {Xt} be a stationary process with the correlation function . Let (h)=((1), (2), ... , (h)) and . It can be shown (see, for example, Brockwell and Davis (1987), p. 214) that under certain general conditions has the asymptotic joint normal distribution with mean (h) and variance C/n as n→. The (i, j) element of the matrix C, cij, is given by This formula was first derived by Bartlett in 1946 and is called Bartlett's formula. Any stationary ARMA model with {Zt} distributed identically and independently with zero mean and finite variance satisfies the conditions of Bartlett's formula.
Hence for large n, the sample correlation at lag i, , is approximately normally distributed with mean (i) and variance cii/n, Bartlett's formula, ( 5.4) or ( 5.5), is extremely useful since it gives us a handle on deciding whether a small value in the sample correlation function is in fact significantly different from zero or is just a result of fluctuations due to the smallness of n. Next we give two examples where Bartlett's formula is used to determine if the sample correlation is zero. Example 5.2 For a sequence of identically and independently distributed white noise {Zt}, (0)=1 and (i)=0 for i≠0. The above formula, ( 5.5), reduces to (for i≠0) That is, for large n, is normally distributed with mean zero and variance 1/n for i=1, 2, ... , h. This implies that 95 percent of the time the plot of should fall within the bounds . In practice, 2 rather than 1.96 is often used in calculating the bounds.
Here we generate a normally distributed random sequence of length 200 with mean 0 and variance 1.5. The sample correlation function is calculated up to 50. The random number generator is seeded first. This generates a random sequence of length 200 with distribution N(0, 1.5). The sample correlation function up to lag 50 is generated. We can display this sample correlation function along with the bounds using Show. The sample correlation function and the bounds are displayed here using Show. The function Plot is used to plot the two constant functions that form the bounds. Out[14]= | |
We see that falls within the bounds for all k>0. We have no reason to reject the hypothesis that the set of data constitutes a realization of a white noise process.
You can also define your own function that plots the given correlation function and the bounds. For example, you can define the following. This defines the function that plots the correlation function and the bounds. You can also define a function that does the same plot as myplotcorr1 but uses the data rather than the correlation function as an argument. Note that now the bound is fixed to be and the correlation is plotted to the maximum lag. Example 5.3 For an MA( q) process, (k)=0 for k>q. From Bartlett's formula ( 5.5), it is easy to see that for i>q only the first term in the sum survives. Therefore, for i>q we have If the data of length n ( n large) are truly a realization of an MA( q) process, we expect the sample correlation function for i>q to fall within the bounds given by about 95 percent of the time. In practice, the true correlation function is unknown and ( 5.6) is used with the sample correlation function in place of .
Here we are given a set of stationary, zero-mean data of length 200 that is generated from an MA(2) process Xt=Zt-0.4Zt-1+1.1Zt-2. We would like to determine the process that generated the data. This seeds the random number generator. This generates a time series of length 200 from the given MA(2) model. We first calculate the sample correlation function and plot it along with the bounds for white noise. corr contains the sample correlation function of the series up to lag 50. The sample correlation function and the bounds for a white noise process are displayed together using myplotcorr1. Out[20]= | |
Since the sample correlation function at lags 1 and 2, and , are well beyond the bound, we conclude that they differ significantly from zero and the data are not likely to be random noise. Since the correlations beyond lag 2 are all rather small we may suspect that the data can be modeled by an MA(2) process. The variance of for k>2 can be calculated using ( 5.6), with the sample correlation function replacing the true correlation function, that is, we calculate .
We first get the sample correlation up to k=2. This is done by extracting the first three elements of corr using Take. This extracts the first three elements of corr. Out[21]= | |
We square the list (each individual element will be squared) and then use Apply to add all the elements together. Out[22]= | |
We have subtracted 1 to get rid of overcounting (0) ( =1). Now we can display the sample correlation function again with the bounds we just calculated. The sample correlation function is displayed along with the newly calculated bounds. Out[23]= | |
This affirms the reasonableness of our conclusion that the data are from an MA(2) process. |