Basic Statistics
Basic descriptive statistics operations.
Given a list with
n elements
xi, the mean
Mean[list] is defined to be

.
The variance
Variance[list] is defined to be
var (x)=
2 (x)=
(xi-
(x))2/ (n-1), for real data. (For complex data

.)
The standard deviation
StandardDeviation[list] is defined to be

.
If the elements in
list are thought of as being selected at random according to some probability distribution, then the mean gives an estimate of where the center of the distribution is located, while the standard deviation gives an estimate of how wide the dispersion in the distribution is.
The median
Median[list] effectively gives the value at the halfway point in the sorted version of
list. It is often considered a more robust measure of the center of a distribution than the mean, since it depends less on outlying values.
The
qth quantile
Quantile[list, q] effectively gives the value that is
q of the way through the sorted version of
list.
For a list of length
n,
Mathematica defines
Quantile[list, q] to be
s[[Ceiling[n q]]], where
s is
Sort[list, Less].
There are, however, about ten other definitions of quantile in use, all potentially giving slightly different results.
Mathematica covers the common cases by introducing four
quantile parameters in the form
Quantile[list, q, {{a, b}, {c, d}}]. The parameters
a and
b in effect define where in the list should be considered a fraction
q of the way through. If this corresponds to an integer position, then the element at that position is taken to be the
qth quantile. If it is not an integer position, then a linear combination of the elements on either side is used, as specified by
c and
d.
The position in a sorted list
s for the
qth quantile is taken to be
k=a+ (n+b) q. If
k is an integer, then the quantile is
sk. Otherwise, it is
s
k
+ (s
k
-s
k
) (c+d (k-
k
)), with the indices taken to be
1 or
n if they are out of range.
| {{0,0},{1,0}} | inverse empirical CDF (default) |
| {{0,0},{0,1}} | linear interpolation (California method) |
| {{1/2,0},{0,0}} | element numbered closest to qn |
| {{1/2,0},{0,1}} | linear interpolation (hydrologist method) |
| {{0,1},{0,1}} | mean-based estimate (Weibull method) |
| {{1,-1},{0,1}} | mode-based estimate |
| {{1/3,1/3},{0,1}} | median-based estimate |
| {{3/8,1/4},{0,1}} | normal distribution estimate |
Common choices for quantile parameters.
Whenever
d=0, the value of the
qth quantile is always equal to some actual element in
list, so that the result changes discontinuously as
q varies. For
d=1, the
qth quantile interpolates linearly between successive elements in
list.
Median is defined to use such an interpolation.
Note that
Quantile[list, q] yields quartiles when
q=m/4 and percentiles when
q=m/100.
| Mean[{x1,x2,...}] | the mean of the xi |
| Mean[{{x1,y1,...},{x2,y2,...},...}] | a list of the means of the xi, yi, ... |
Handling multidimensional data.
Sometimes each item in your data may involve a list of values. The basic statistics functions in
Mathematica automatically apply to all corresponding elements in these lists.
This separately finds the mean of each "column" of data.
| Out[1]= |  |
|
Note that you can extract the elements in the
ith "column" of a multidimensional list using
list[[All, i]].