Wolfram Language & System 10.0 (2014)Legacy Documentation
Dataset
Dataset[data]
represents a structured dataset based on a hierarchy of lists and associations.
DetailsDetails
 Dataset can represent not only full rectangular multidimensional arrays of data, but also arbitrary tree structures, corresponding to data with arbitrary hierarchical structure.
 Dataset[…][op_{1},op_{2},…] is equivalent to Query[op_{1},op_{2},…][Dataset[…]], which applies the sequence of operators at successively deeper levels and yields the resulting Dataset object.
 The can be any of the following forms:

All,i,i;;j,"key",Key[…] part operators Select[…],MaximalBy[…],… filtering operators Counts,Total,Mean,Max,… aggregation operators Query[…],{op_{1},op_{2},…},… subquery operators Function[…],f arbitrary functions  In Dataset[…][op_{1},op_{2},…], the are applied at successively deeper levels in expr, but any given one may be applied either while "descending" into expr or while "ascending" out of it. In general, part specifications and filtering operators are "descending" operators. Aggregation operators, subquery operators, and arbitrary functions are "ascending" operators.
 A "descending" operator is applied to corresponding parts of the original dataset, before subsequent operators are applied at deeper levels. Descending operators have the feature that they do not change the structure of deeper levels of the data when applied at a certain level. This ensures that subsequent operators will encounter subexpressions whose structure is identical to the corresponding levels of the original dataset. The simplest descending operator is All, which selects all parts at a given level and therefore leaves the structure of the data at that level unchanged.
 An "ascending" operator is applied after all subsequent operators have been applied to deeper levels. Whereas descending operators correspond to the levels of the original data, ascending operators correspond to the levels of the result. Unlike descending operators, ascending operators do not necessarily preserve the structure of the data they operate on. Unless an operator is specifically recognized to be descending, it is assumed to be ascending.
 The "descending" part operators specify which elements to take at a level before applying any subsequent operators to deeper levels:

All apply subsequent operators to each part of a list or association i;;j take parts i through j and apply subsequent operators to each part i take only part i and apply subsequent operators to it "key",Key[key] take value of key in an association and apply subsequent operators to it Keys take keys of an association and apply subsequent operators to each key Values take values of an association and apply subsequent operators to each value {part_{1},part_{2},…} take given parts and apply subsequent operators to each part  The "descending" filtering operators specify how to rearrange or filter elements at a level before applying subsequent operators to deeper levels:

Select[test] take only those parts of a list or association that satisfy test SelectFirst[test] take the first part that satisfies test KeySelect[test] take those parts of an association whose keys satisfy test MaximalBy[crit],MinimalBy[crit] take the parts for which criteria crit is minimal or maximal SortBy[crit] sort parts in order of crit KeySortBy[crit] sort parts of an association based on their keys, in order of crit DeleteDuplicatesBy[crit] take parts that are unique according to crit DeleteMissing drop elements with head Missing  The "ascending" aggregation operators combine or summarize the results of applying subsequent operators to deeper levels:

Total total all quantities in the result Min,Max give minimum, maximum quantity in the result Mean,Median,Quantile,… give statistical summary of the result Histogram,ListPlot,… calculate a visualization on the result Merge[f] merge common keys of associations in the result using function f Catenate catenate the elements of lists or associations together Counts give association that counts occurences of values in the result CountsBy[crit] give association that counts occurences of values according to crit CountDistinct give number of distinct values in the result CountDistinctBy[crit] give number of distinct values in the result according to crit  The "ascending" subquery operators perform a subquery after applying subsequent operators to deeper levels:

Query[…] perform a subquery on the result {op_{1},op_{2},…} apply multiple operators at once to the result, yielding a list op_{1}/* op_{2}/* … apply , then apply at the same level, etc. <key_{1}op_{1},key_{2}op_{2},…> apply multiple operators at once to the result, yielding an association with the given keys {key_{1}op_{1},key_{2}op_{2},…} apply different operators to specific parts in the result  When one or more descending operators are composed with one or more ascending operators (e.g. ), the descending part will be applied, then subsequent operators will be applied to deeper levels, and lastly the ascending part will be applied to the result.
 The special descending operator GroupBy[spec] will introduce a new association at the level at which it appears, and can be inserted or removed from an existing query without affecting the behavior of other operators.
 The syntax GroupBy["string"] can be used as a synonym for GroupBy[Key["string"]]. The same syntax is also available for SortBy, CountsBy, MaximalBy, MinimalBy, and DeleteDuplicatesBy.
 Where possible, type inference is used to determine whether operations will succeed. Operations that are guaranteed to fail will result in a Failure object being returned.
 When a Dataset operation returns structured data (e.g. a list or association or nested combinations of these), the result will be given in the form of another Dataset object. Otherwise, the result will be given as an ordinary Wolfram Language expression.
 Normal can be used to convert any Dataset object to a combination of lists and associations.
ExamplesExamplesopen allclose all
Basic Examples (1)Basic Examples (1)
Create a Dataset object from tabular data:
In[1]:= 
Out[1]= 
In[2]:= 
Out[2]= 
In[3]:= 
Out[3]= 
A row is merely an association:
In[4]:= 
Out[4]= 
Take a specific element from a specific row:
In[5]:= 
Out[5]= 
In[6]:= 
Out[6]= 
In[7]:= 
Out[7]= 
Take the contents of a specific column:
In[8]:= 
Out[8]= 
In[9]:= 
Out[9]= 
In[10]:= 
Out[10]= 
Take a specific part within a column:
In[11]:= 
Out[11]= 
Take a subset of the rows and columns:
In[12]:= 
Out[12]= 
Apply a function to the contents of a specific column:
In[13]:= 
Out[13]= 
In[14]:= 
Out[14]= 
In[15]:= 
Out[15]= 
Partition the dataset based on a column, applying further operators to each group:
In[16]:= 
Out[16]= 
In[17]:= 
Out[17]= 
Apply a function both to each row and to the entire result:
In[18]:= 
Out[18]= 
Apply a function to every element in every row:
In[19]:= 
Out[19]= 
Apply functions to each column independently:
In[20]:= 
Out[20]= 
Construct a new table by specifying operators that will compute each column:
In[21]:= 
Out[21]= 
Use the same technique to rename columns:
In[22]:= 
Out[22]= 
Select specific rows based on a criterion:
In[23]:= 
Out[23]= 
Take the contents of a column after selecting the rows:
In[24]:= 
Out[24]= 
Take a subset of the available columns after selecting the rows:
In[25]:= 
Out[25]= 
Take the first row satisfying a criterion:
In[26]:= 
Out[26]= 
In[27]:= 
Out[27]= 
In[28]:= 
Out[28]= 
Take the rows that give the maximal value of a scoring function:
In[29]:= 
Out[29]= 
Delete rows that duplicate a criterion:
In[30]:= 
Out[30]= 
In[31]:= 
Out[31]= 
Compose an ascending and a descending operator to aggregate values of a column after filtering the rows:
In[32]:= 
Out[32]= 
Do the same thing by applying Total after the query:
In[33]:= 
Out[33]= 