Statistics`DescriptiveStatistics`
Descriptive statistics refers to properties of distributions, such as location, dispersion, and shape. The functions in this package compute descriptive statistics of lists of data. You can calculate some of the standard descriptive statistics for various known distributions by using the Statistics`ContinuousDistributions` and Statistics`DiscreteDistributions` packages. This package also provides some commonly used data transformations.
Note that this package is automatically loaded when most other statistical packages are used. For example, all the functions described below are available for use with the package Statistics`HypothesisTests`.
The statistics are calculated assuming that each value of data has probability equal to , where is the number of elements in the data.
Location statistics.
Location statistics describe where the data are located. The most common functions include measures of central tendency like the mean, median, and mode. Quantile[data, q] gives the location before which percent of the data lie. In other words, Quantile gives a value such that the probability that is less than or equal to and the probability that is greater than or equal to . The interpolated quantile values at = 0.25, 0.5 and 0.75 are called the quartiles, and you can obtain them using Quartiles.
Note that the functions Mean, Median, and Quantile are found in the kernel. This package does not need to be loaded to use them on lists of data.
This loads the package.
In[1]:= <<Statistics`DescriptiveStatistics`
Here is a data set.
In[2]:= data = {6.5, 3.8, 6.6, 5.7, 6.0, 6.4, 5.3}
Out[2]=
This gives some general location information about the data.
In[3]:= LocationReport[data]
Out[3]=
You can use the replacement operator /. to extract a particular statistic from the report.
In[4]:= m = Mean /. %
Out[4]=
This is the mean when the smallest entry in the list is excluded. TrimmedMean allows you to describe the data with removed outliers.
In[5]:= TrimmedMean[data, {1/7, 0}]
Out[5]=
Dispersion statistics.
Dispersion statistics summarize the scatter or spread of the data. Most of these functions describe deviation from a particular location. For instance, variance is a measure of deviation from the mean, and standard deviation is just the square root of the variance.
The range is a value describing the total spread of the data. SampleRange gives the difference between the largest and smallest value in data, while InterquartileRange gives the difference between the and the quartiles.
Note that the functions Variance and StandardDeviation are found in the kernel. This package does not need to be loaded to use them on lists of data.
This gives an unbiased estimate for the variance of the data with as the divisor.
In[6]:= var1 = Variance[data]
Out[6]=
Here is the maximum likelihood estimate with division by .
In[7]:= var2 = VarianceMLE[data]
Out[7]=
We can check the relationship between the two estimators.
In[8]:= var1 (Length[data]  1) == var2 Length[data]
Out[8]=
Shape statistics.
You can get some information about the shape of a distribution using shape statistics. Skewness describes the amount of asymmetry. Kurtosis measures the concentration of data around the peak and in the tails versus the concentration in the flanks.
Skewness is calculated by dividing the third central moment by the cube of the standard deviation. Pearson's two coefficients provide two other wellknown measures of skewness. PearsonSkewness1 and PearsonSkewness2 are found by multiplying three times the difference between the mean and either the mode or the median, respectively, and dividing this quantity by the standard deviation of the sample. QuartileSkewness gives a measure of asymmetry within the first and third quartiles.
Kurtosis is calculated by dividing the fourth central moment by the square of the variance of the data. KurtosisExcess is shifted so that it is zero for the normal distribution, positive for distributions with a prominent peak and heavy tails, and negative for distributions with prominent flanks.
Here is the second central moment, which is the same as the maximum likelihood estimate of variance.
In[9]:= CentralMoment[data, 2]
Out[9]=
A negative value for skewness indicates that the distribution underlying the data has a long leftsided tail.
In[10]:= Skewness[data]
Out[10]=
Expected value.
Other location, dispersion, and shape statistics can be computed by taking the expected value of a function with respect to the sample distribution of the data.
This gives the average square root of the data.
In[11]:= ExpectedValue[Sqrt[#]&, data]
Out[11]=
Data transformations.
Occasionally it is useful to apply transformations to the data using descriptive statistics. ZeroMean shifts the data to have zero mean and Standardize both shifts and scales the data to have unity variance. The default is to standardize using the unbiased estimate of variance; the maximum likelihood estimate is selected using the option MLE > True.
The mean of the shifted data is approximately 0.
In[12]:= Mean[ZeroMean[data]]
Out[12]=
After standardizing, the variance is approximately 1.
In[13]:= Variance[Standardize[data]]
Out[13]=
