Mathematica 教程 函数 »|教程 »|更多关于 »

# Descriptive Statistics

Descriptive statistics refers to properties of distributions, such as location, dispersion, and shape. The functions described here compute descriptive statistics of lists of data. You can calculate some of the standard descriptive statistics for various known distributions by using the functions described in "Continuous Distributions" and "Discrete Distributions".
The statistics are calculated assuming that each value of data xi has probability equal to , where n is the number of elements in the data.
 Mean[data] average value Median[data] median (central value) Commonest[data] list of the elements with highest frequency GeometricMean[data] geometric mean HarmonicMean[data] harmonic mean RootMeanSquare[data] root mean square TrimmedMean[data,f] mean of remaining entries, when a fraction f is removed from each end of the sorted list of data TrimmedMean[data,{f1,f2}] mean of remaining entries, when fractions f1 and f2 are dropped from each end of the sorted data Quantile[data,q] qth quantile Quartiles[data] list of the th, th, th quantiles of the elements in list

Location statistics.

Location statistics describe where the data are located. The most common functions include measures of central tendency like the mean, median, and mode. Quantile[data, q] gives the location before which (100q) percent of the data lie. In other words, Quantile gives a value z such that the probability that (xi<z) is less than or equal to q and the probability that (xiz) is greater than or equal to q.
Here is a dataset.
 Out[1]=
This finds the mean and median of the data.
 Out[2]=
This is the mean when the smallest entry in the list is excluded. TrimmedMean allows you to describe the data with removed outliers.
 Out[3]=
 Variance[data] unbiased estimate of variance, StandardDeviation[data] unbiased estimate of standard deviation MeanDeviation[data] mean absolute deviation, MedianDeviation[data] median absolute deviation, median of xi-median values InterquartileRange[data] difference between the first and third quartiles QuartileDeviation[data] half the interquartile range

Dispersion statistics.

Dispersion statistics summarize the scatter or spread of the data. Most of these functions describe deviation from a particular location. For instance, variance is a measure of deviation from the mean, and standard deviation is just the square root of the variance.
This gives an unbiased estimate for the variance of the data with n-1 as the divisor.
 Out[4]=
This compares three types of deviation.
 Out[5]=
 Covariance[v1,v2] covariance coefficient between lists v1 and v2 Covariance[m] covariance matrix for the matrix m Covariance[m1,m2] covariance matrix for the matrices m1 and m2 Correlation[v1,v2] correlation coefficient between lists v1 and v2 Correlation[m] correlation matrix for the matrix m Correlation[m1,m2] correlation matrix for the matrices m1 and m2

Covariance and correlation statistics.

Covariance is the multivariate extension of variance. For two vectors of equal length, the covariance is a number. For a single matrix m, the i, jth element of the covariance matrix is the covariance between the ith and jth columns of m. For two matrices m1 and m2, the i, jth element of the covariance matrix is the covariance between the ith column of m1 and the jth column of m2.
While covariance measures dispersion, correlation measures association. The correlation between two vectors is equivalent to the covariance between the vectors divided by the standard deviations of the vectors. Likewise, the elements of a correlation matrix are equivalent to the elements of the corresponding covariance matrix scaled by the appropriate column standard deviations.
This gives the covariance between data and a random vector.
 Out[6]=
Here is a random matrix.
 Out[7]=
This is the correlation matrix for the matrix m.
 Out[8]=
This is the covariance matrix.
 Out[9]=
Scaling the covariance matrix terms by the appropriate standard deviations gives the correlation matrix.
 Out[10]=
 CentralMoment[data,r] rth central moment Skewness[data] coefficient of skewness Kurtosis[data] kurtosis coefficient QuartileSkewness[data] quartile skewness coefficient

Shape statistics.

You can get some information about the shape of a distribution using shape statistics. Skewness describes the amount of asymmetry. Kurtosis measures the concentration of data around the peak and in the tails versus the concentration in the flanks.
Skewness is calculated by dividing the third central moment by the cube of the population standard deviation. Kurtosis is calculated by dividing the fourth central moment by the square of the population variance of the data, equivalent to CentralMoment[data, 2]. (The population variance is the second central moment, and the population standard deviation is its square root.)
QuartileSkewness is calculated from the quartiles of data. It is equivalent to (q1-2q2+q3)/ (q3-q1), where q1, q2, and q3 are the first, second, and third quartiles respectively.
Here is the second central moment of the data.
 Out[11]=
A negative value for skewness indicates that the distribution underlying the data has a long left-sided tail.
 Out[12]=
 ExpectedValue[f,list] expected value of the pure function f with respect to the values in list ExpectedValue[f[x],list,x] expected value of the function f of x with respect to the values of list

Expected values.

The expected value of a function f is for the list of values x1, x2, ..., xn. Many descriptive statistics are expected values. For instance, the mean is the expected value of x, and the rth central moment is the expected value of where is the mean of the xi.
Here is the expected value of the Log of the data.
 Out[13]=