Dataset
Dataset[data]
represents a structured dataset based on a hierarchy of lists and associations.
DetailsDetails
 Dataset can represent not only full rectangular multidimensional arrays of data, but also arbitrary tree structures, corresponding to data with arbitrary hierarchical structure.
 Depending on the data it contains, a Dataset object displays as a table or grid of elements.
 Functions like Map, Select, etc. can be applied directly to a Dataset by writing Map[f,dataset], Select[dataset,crit], etc.
 Dataset objects can also be queried using a specialized query syntax by writing .
 While arbitrary nesting of lists and associations is possible, twodimensional (tabular) forms are most commonly used.
 The following table shows the correspondence between the common display forms of a Dataset, the form of Wolfram Language expression it contains, and logical interpretation of its structure as a table:

_{} {{◻,◻,◻},
{◻,◻,◻},
{◻,◻,◻},
{◻,◻,◻}}
_{list of lists}a table without named rows and columns _{}
{<"x"◻,"y"◻,…>,
<"x"◻,"y"◻,…>,
<"x"◻,"y"◻,…> }
_{list of associations}a table with named columns _{} <"a"{◻,◻,◻},
"b"{◻,◻,◻},
"c"{◻,◻,◻},
"d"{◻,◻,◻}>
_{association of lists}a table with named rows _{} <"a"<"x"◻,"y"◻>,
"b"<"x"◻,"y"◻>,
"c"<"x"◻,"y"◻>>
_{association of associations}a table with named columns and named rows  Dataset interprets nested lists and associations in a rowwise fashion, so that level 1 (the outermost level) of the data is interpreted as the rows of a table, and level 2 is interpreted as the columns.
 Named rows and columns correspond to associations at level 1 and 2 respectively, whose keys are strings that contain the names. Unnamed rows and columns correspond to lists at those levels.
 The syntax or Part[dataset,parts] can be used to extract parts of a Dataset.
 The parts that can be extracted from a Dataset include all ordinary specifications for Part.
 Unlike the ordinary behavior of Part, if a specified subpart of a Dataset is not present, Missing["PartAbsent",…] will be produced in that place in the result.
 The following part operations are commonly used to extract rows from tabular datasets:

dataset[["name"]] extract a named row (if applicable) dataset[[{"name_{1}",…}]] extract a set of named rows dataset[[1]] extract the first row dataset[[n]] extract the n row dataset[[1]] extract the last row dataset[[m;;n]] extract rows m through n dataset[[{n_{1},n_{2},…}]] extract a set of numbered rows  The following part operations are commonly used to extract columns from tabular datasets:

dataset[[All,"name"]] extract a named column (if applicable) dataset[[All,{"name_{1}",…}]] extract a set of named columns dataset[[All,1]] extract the first column dataset[[All,n]] extract the n column dataset[[All,1]] extract the last column dataset[[All,m;;n]] extract columns m through n dataset[[All,{n_{1},n_{2},…}]] extract a subset of the columns  Like Part, row and column operations can be combined. Some examples include:

dataset[[n,m]] take the cell at the n row and mcolumn dataset[[n,"colname"]] extract the value of the named column in the n row dataset[["rowname","colname"]] take the cell at the named row and column  The following operations can be used to remove the labels from rows and columns, effectively turning associations into lists:

dataset[[Values]] remove labels from rows dataset[[All,Values]] remove labels from columns  The query syntax can be thought of as an extension of Part syntax to allow aggregations and transformations to be applied as well as taking subsets of data.
 Some common forms of query include:

dataset[f] apply f to the entire table dataset[All,f] apply f to every row in the table dataset[All,All,f] apply f to every cell in the table dataset[f,n] extract the n column, then apply f to it dataset[f,"name"] extract the named column, then apply f to it dataset[n,f] extract the n row, then apply f to it dataset["name",f] extract the row, then apply f to it dataset[{nf}] selectively map f onto the n row dataset[All,{nf}] selectively map f onto the ncolumn  Some more specialized forms of query include:

dataset[Counts,"name"] give counts of different values in the named column dataset[Count[value],"name"] give number of occurences of value in the named column dataset[MinMax,"name"] give minimum and maximum values in the named column dataset[Mean,"name"] give the mean value of the named column dataset[Total,"name"] give the total value of the named column dataset[Select[h]] extract those rows that satisfy condition h dataset[Select[h]/*Length] count the number of rows that satisfy condition h dataset[Select[h],"name"] select rows, then extract the named column from the result dataset[Select[h]/*f,"name"] select rows, extract the named column, then apply f to it  In , the query operators are effectively applied at successively deeper levels of the data, but any given one may be applied either while "descending" into the data or while "ascending" out of it.
 The operators that make up a Dataset query fall into one of the following broad categories with distinct ascending and descending behavior:

All,i,i;;j,"key",… descending part operators Select[f],SortBy[f],… descending filtering operators Counts,Total,Mean,… ascending aggregation operators Query[…],… ascending subquery operators Function[…],f ascending arbitrary functions  A descending operator is applied to corresponding parts of the original dataset, before subsequent operators are applied at deeper levels.
 Descending operators have the feature that they do not change the structure of deeper levels of the data when applied at a certain level. This ensures that subsequent operators will encounter subexpressions whose structure is identical to the corresponding levels of the original dataset.
 The simplest descending operator is All, which selects all parts at a given level and therefore leaves the structure of the data at that level unchanged. All can safely be replaced with any other descending operator to yield another valid query.
 An ascending operator is applied after all subsequent ascending and descending operators have been applied to deeper levels. Whereas descending operators correspond to the levels of the original data, ascending operators correspond to the levels of the result.
 Unlike descending operators, ascending operators do not necessarily preserve the structure of the data they operate on. Unless an operator is specifically recognized to be descending, it is assumed to be ascending.
 The descending part operators specify which elements to take at a level before applying any subsequent operators to deeper levels:

All apply subsequent operators to each part of a list or association i;;j take parts i through j and apply subsequent operators to each part i take only part i and apply subsequent operators to it "key",Key[key] take value of key in an association and apply subsequent operators to it Values take values of an association and apply subsequent operators to each value {part_{1},part_{2},…} take given parts and apply subsequent operators to each part  The descending filtering operators specify how to rearrange or filter elements at a level before applying subsequent operators to deeper levels:

Select[test] take only those parts of a list or association that satisfy test SelectFirst[test] take the first part that satisfies test KeySelect[test] take those parts of an association whose keys satisfy test TakeLargestBy[f,n],TakeSmallestBy[f,n] take the n elements for which is largest or smallest MaximalBy[crit],MinimalBy[crit] take the parts for which criteria crit is minimal or maximal SortBy[crit] sort parts in order of crit KeySortBy[crit] sort parts of an association based on their keys, in order of crit DeleteDuplicatesBy[crit] take parts that are unique according to crit DeleteMissing drop elements with head Missing  The syntax can be used to combine two or more filtering operators into one operator that still operates at a single level.
 The ascending aggregation operators combine or summarize the results of applying subsequent operators to deeper levels:

Total total all quantities in the result Min,Max give minimum, maximum quantity in the result Mean,Median,Quantile,… give statistical summary of the result Histogram,ListPlot,… calculate a visualization on the result Merge[f] merge common keys of associations in the result using function f Catenate catenate the elements of lists or associations together Counts give association that counts occurrences of values in the result CountsBy[crit] give association that counts occurrences of values according to crit CountDistinct give number of distinct values in the result CountDistinctBy[crit] give number of distinct values in the result according to crit TakeLargest[n],TakeSmallest[n] take the largest or smallest n elements  The syntax can be used to combine two or more aggregation operators into one operator that still operates at a single level.
 The ascending subquery operators perform a subquery after applying subsequent operators to deeper levels:

Query[…] perform a subquery on the result {op_{1},op_{2},…} apply multiple operators at once to the result, yielding a list <key_{1}op_{1},key_{2}op_{2},…> apply multiple operators at once to the result, yielding an association with the given keys {key_{1}op_{1},key_{2}op_{2},…} apply different operators to specific parts in the result  When one or more descending operators are composed with one or more ascending operators (e.g. ), the descending part will be applied, then subsequent operators will be applied to deeper levels, and lastly, the ascending part will be applied to the result at that level.
 The special descending operator GroupBy[spec] will introduce a new association at the level at which it appears and can be inserted or removed from an existing query without affecting subsequent operators.
 Functions such as CountsBy, GroupBy, and TakeLargestBy normally take another function as one of their arguments. When working with associations in a Dataset, it is common to use this "by" function to look up the value of a column in a table.
 To facilitate this, Dataset queries allow the syntax to mean Key["string"] in such contexts. For example, the query operator GroupBy["string"] is automatically rewritten to GroupBy[Key["string"]] before being executed.
 Similarly, the expression GroupBy[dataset,"string"] is rewritten as GroupBy[dataset,Key["string"]].
 Where possible, type inference is used to determine whether a query will succeed. Operations that are inferred to fail will result in a Failure object being returned without the query being performed.
 By default, if any messages are generated during a query, the query will be aborted and a Failure object containing the message will be returned.
 When a query returns structured data (e.g. a list or association, or nested combinations of these), the result will be given in the form of another Dataset object. Otherwise, the result will be given as an ordinary Wolfram Language expression.
 For more information about special behavior of Dataset queries, see the function page for Query.
 Normal can be used to convert any Dataset object to its underlying data, which is typically a combination of lists and associations.
 Dataset objects can be exported by writing Export["file.ext",dataset] or ExportString[dataset,"fmt"]. The following formats are supported:

"CSV" a commaseparated table of values "TSV" a tabseparated table of values "JSON" a JSON expression in which associations become objects "M" a humanreadable Wolfram Language expression "MX" a packed binary protocol  SemanticImport can be used to import files as Dataset objects.
ExamplesExamplesopen allclose all
Basic Examples (1)Basic Examples (1)
Create a Dataset object from tabular data:
In[1]:= 
Out[1]= 
In[2]:= 
Out[2]= 
In[3]:= 
Out[3]= 
A row is merely an association:
In[4]:= 
Out[4]= 
Take a specific element from a specific row:
In[5]:= 
Out[5]= 
In[6]:= 
Out[6]= 
In[7]:= 
Out[7]= 
Take the contents of a specific column:
In[8]:= 
Out[8]= 
In[9]:= 
Out[9]= 
In[10]:= 
Out[10]= 
Take a specific part within a column:
In[11]:= 
Out[11]= 
Take a subset of the rows and columns:
In[12]:= 
Out[12]= 
Apply a function to the contents of a specific column:
In[13]:= 
Out[13]= 
In[14]:= 
Out[14]= 
In[15]:= 
Out[15]= 
Partition the dataset based on a column, applying further operators to each group:
In[16]:= 
Out[16]= 
In[17]:= 
Out[17]= 
Apply a function both to each row and to the entire result:
In[18]:= 
Out[18]= 
Apply a function to every element in every row:
In[19]:= 
Out[19]= 
Apply functions to each column independently:
In[20]:= 
Out[20]= 
Construct a new table by specifying operators that will compute each column:
In[21]:= 
Out[21]= 
Use the same technique to rename columns:
In[22]:= 
Out[22]= 
Select specific rows based on a criterion:
In[23]:= 
Out[23]= 
Take the contents of a column after selecting the rows:
In[24]:= 
Out[24]= 
Take a subset of the available columns after selecting the rows:
In[25]:= 
Out[25]= 
Take the first row satisfying a criterion:
In[26]:= 
Out[26]= 
In[27]:= 
Out[27]= 
In[28]:= 
Out[28]= 
Take the rows that give the maximal value of a scoring function:
In[29]:= 
Out[29]= 
Give the top 3 rows according to a scoring function:
In[30]:= 
Out[30]= 
Delete rows that duplicate a criterion:
In[31]:= 
Out[31]= 
In[32]:= 
Out[32]= 
Compose an ascending and a descending operator to aggregate values of a column after filtering the rows:
In[33]:= 
Out[33]= 
Do the same thing by applying Total after the query:
In[34]:= 
Out[34]= 