EmpiricalDistribution
EmpiricalDistribution[{x1,x2,…}]
represents an empirical distribution based on the data values xi.
EmpiricalDistribution[{{x1,y1,…},{x2,y2,…},…}]
represents a multivariate empirical distribution based on the data values {xi,yi,…}.
EmpiricalDistribution[{w1,w2,…}{d1,d2,…}]
represents an empirical distribution where data values di occur with weights wi.
Details
- EmpiricalDistribution returns a DataDistribution object that can be used like any other probability distribution.
- The cumulative distribution function for EmpiricalDistribution for a value x is given by .
- EmpiricalDistribution can be used with such functions as Mean, CDF, and RandomVariate.
Examples
open allclose allBasic Examples (2)
Create an empirical distribution of univariate data:
Visualize distribution functions:
Compute moments and quantiles:
Create an empirical distribution of bivariate data:
Visualize the estimated CDF:
Scope (19)
Basic Uses (10)
Create an empirical distribution of univariate data:
Larger datasets lead to better approximations of the underlying distribution:
Construct empirical distribution for quantity data:
Calculate select descriptive statistics:
Specify a list of weights corresponding to each data value:
A general moment of the distribution:
Create an empirical distribution of bivariate data:
Larger datasets produce smoother estimates:
Specify a list of weights for bivariate data:
Create an empirical distribution of data in higher dimensions:
Plot the univariate marginal CDFs:
Plot the bivariate marginal CDFs:
EmpiricalDistribution works with the values only when the input is a TimeSeries:
Compare to using the values only:
EmpiricalDistribution works with all the values together when the input is a TemporalData:
Distribution Properties (9)
Obtain empirical estimates of distribution functions:
PDF and HazardFunction are discrete:
CDF and SurvivalFunction are piecewise constant:
Estimate the quantile function:
Generate a set of random numbers:
Compare the histogram to the PDF of the underlying density:
Compute probabilities and expectations:
Estimate bivariate distribution functions:
CDF and SurvivalFunction are piecewise constant:
Applications (8)
Compare the distribution of data to a theoretical distribution:
Compare multivariate data to a theoretical distribution:
Produce a smoothed representation with SmoothKernelDistribution:
Using HistogramDistribution with bin delimiters set to the data creates a linear interpolation of EmpiricalDistribution:
Ten letters published in 1861 under the name Quintus Curtius Snodgrass are claimed to have been authored by Mark Twain. Compare the word length distribution for the letters to some works by Mark Twain:
Comparison to the English language in general emphasizes the similarity:
A test for goodness of fit suggests, however, that Twain did not write the QCS letters:
Compare the distributions of winning times in Scottish hill races for those who take the high road and those who take the low road:
Plot record times vs. elevation gain:
Find the median elevation gain:
Split races at the median elevation gain to the high and low roads:
It appears that it is faster to take the low road:
The record times of high road races vary more than those of low road races:
The National Institutes of Health estimates that 2% of the population has a certain disease. A test for the disease is proposed that detects its presence 95% of the time with a false positive rate of 5%. Given that a patient tests positive, find the probability that he or she actually has the disease:
Equations for the unknown probabilities based on the information given:
Solve the equations, assuming the probabilities sum to unity:
The probability a patient has the disease given a positive test result:
A group of 21 students was selected at random to participate in a new directed reading program. A control group of 23 students was educated with traditional methods. Reading test scores for students in the two groups were recorded following their programs. Perform a permutation-based test on the scores to determine if the directed reading program was successful:
The mean difference in test scores across the groups can be used as a test statistic:
Simulate the null distribution of the test statistic by randomly permuting the groups:
At the 5% level, there is evidence that the new program made a difference:
LocationTest could have been used to test the hypothesis directly:
Properties & Relations (8)
Random number generation from an empirical distribution returns a bootstrapped sample:
EmpiricalDistribution is a consistent estimator of the underlying distribution:
Moments and their equivalence to those of the data:
The population rather than the sample variance is used for empirical distributions:
Quantiles are equivalent to Quantile applied directly to the data:
EmpiricalDistribution is equivalent to SurvivalDistribution with no censoring:
Use the union of data values as bin delimiters for HistogramDistribution:
The resulting PDF is a zero-order interpolation of the PDF for EmpiricalDistribution:
Applying N to exact data can reduce memory consumption:
EmpiricalDistribution on integers can be specified using ProbabilityDistribution:
Text
Wolfram Research (2010), EmpiricalDistribution, Wolfram Language function, https://reference.wolfram.com/language/ref/EmpiricalDistribution.html (updated 2016).
CMS
Wolfram Language. 2010. "EmpiricalDistribution." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2016. https://reference.wolfram.com/language/ref/EmpiricalDistribution.html.
APA
Wolfram Language. (2010). EmpiricalDistribution. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/EmpiricalDistribution.html