How to | Perform Operations on Subgroups of Data
When summarizing data, it is often useful to analyze it by subgroup. For example, crop yields could be categorized by seed variety, or average patient recovery time by patient age or drug type. The Wolfram Language lets you split columns of data based on the values in other columns. You can then compute the desired statistics on the resulting groups.
Average yields by soil type can be computed by extracting the soil type from each list and computing the Mean of the yields (the last elements) in the list. With the three groups above as the possible values for the iterator variable x, you can use Table to get these results.
Use Table to get the soil type and average yield for each group:
Use the pure function #[]& with GatherBy to gather the data by seed type:
Computing the means by seed type is very similar to what was done for soil type. However, the newly grouped bySeedType data is used instead. Also, the second element is extracted to get the seed type.
Use Table to compute the mean yield for each seed type:
Use Table with describe to compute descriptive statistics by drug. The ordering of results for each drug type matches that used in the definition of describe (sample size, mean, median, sample range):
When grouping by age, you may want to create groups that correspond to a range of ages instead of just individual ages. In this example, the patient's decade of life corresponds to his or her age group.
To create these groups, each age is divided by 10 and IntegerPart is then used to take the digits to the left of the decimal point. GatherBy is used to gather the data into age groups based on this number:
Just as before, statistics can be computed on the grouped data. Here, the data is sorted by drug type and is displayed in a Grid:
For more information on displaying and formatting tables of data, see How to: Work with Tables.