# Descriptive Statistics

Descriptive statistics refers to properties of distributions, such as location, dispersion, and shape. The functions described here compute descriptive statistics of lists of data. You can calculate some of the standard descriptive statistics for various known distributions by using the functions described in "Continuous Distributions" and "Discrete Distributions".

The statistics are calculated assuming that each value of data has probability equal to , where is the number of elements in the data.

Mean[data] | average value |

Median[data] | median (central value) |

Commonest[data] | list of the elements with highest frequency |

GeometricMean[data] | geometric mean |

HarmonicMean[data] | harmonic mean |

RootMeanSquare[data] | root mean square |

TrimmedMean[data,f] | mean of remaining entries, when a fraction is removed from each end of the sorted list of data |

TrimmedMean[data,{f_{1},f_{2}}] | mean of remaining entries, when fractions and are dropped from each end of the sorted data |

Quantile[data,q] | quantile |

Quartiles[data] | list of the , , quantiles of the elements in list |

Location statistics describe where the data is located. The most common functions include measures of central tendency like the mean, median, and mode. Quantile[data, q] gives the location before which percent of the data lie. In other words, Quantile gives a value such that the probability that is less than or equal to and the probability that is greater than or equal to .

In[1]:= |

Out[1]= |

In[2]:= |

Out[2]= |

In[3]:= |

Out[3]= |

Variance[data] | unbiased estimate of variance, |

StandardDeviation[data] | unbiased estimate of standard deviation |

MeanDeviation[data] | mean absolute deviation, |

MedianDeviation[data] | median absolute deviation, median of values |

InterquartileRange[data] | difference between the first and third quartiles |

QuartileDeviation[data] | half the interquartile range |

Dispersion statistics summarize the scatter or spread of the data. Most of these functions describe deviation from a particular location. For instance, variance is a measure of deviation from the mean, and standard deviation is just the square root of the variance.

In[4]:= |

Out[4]= |

In[5]:= |

Out[5]= |

Covariance[v_{1},v_{2}] | covariance coefficient between lists and |

Covariance[m] | covariance matrix for the matrix m |

Covariance[m_{1},m_{2}] | covariance matrix for the matrices and |

Correlation[v_{1},v_{2}] | correlation coefficient between lists and |

Correlation[m] | correlation matrix for the matrix m |

Correlation[m_{1},m_{2}] | correlation matrix for the matrices and |

Covariance and correlation statistics.

Covariance is the multivariate extension of variance. For two vectors of equal length, the covariance is a number. For a single matrix m, the element of the covariance matrix is the covariance between the i and j columns of m. For two matrices and , the element of the covariance matrix is the covariance between the i column of and the j column of .

While covariance measures dispersion, correlation measures association. The correlation between two vectors is equivalent to the covariance between the vectors divided by the standard deviations of the vectors. Likewise, the elements of a correlation matrix are equivalent to the elements of the corresponding covariance matrix scaled by the appropriate column standard deviations.

In[6]:= |

Out[6]= |

In[7]:= |

Out[7]= |

In[8]:= |

Out[8]= |

In[9]:= |

Out[9]= |

In[10]:= |

Out[10]= |

CentralMoment[data,r] | r central moment |

Skewness[data] | coefficient of skewness |

Kurtosis[data] | kurtosis coefficient |

QuartileSkewness[data] | quartile skewness coefficient |

You can get some information about the shape of a distribution using shape statistics. Skewness describes the amount of asymmetry. Kurtosis measures the concentration of data around the peak and in the tails versus the concentration in the flanks.

Skewness is calculated by dividing the third central moment by the cube of the population standard deviation. Kurtosis is calculated by dividing the fourth central moment by the square of the population variance of the data, equivalent to CentralMoment[data, 2]. (The population variance is the second central moment, and the population standard deviation is its square root.)

QuartileSkewness is calculated from the quartiles of data. It is equivalent to , where , , and are the first, second, and third quartiles respectively.

In[11]:= |

Out[11]= |

In[12]:= |

Out[12]= |

Expectation[f[x],xlist] | expected value of the function f of x with respect to the values of list |

The expectation or expected value of a function is for the list of values , , ..., . Many descriptive statistics are expectations. For instance, the mean is the expected value of , and the central moment is the expected value of where is the mean of the .

In[13]:= |

Out[13]= |