Relational Databases Quick Start


Purpose of This Tutorial

This tutorial explains how to use the Entity framework to construct and execute queries for relational databases.

It is intended as an example-based quick-start guide, rather than a complete reference. It covers most frequent use cases and illustrates those with concrete examples.

After going through this tutorial, one can expect to have gained the core practical understanding of how to construct Entity framework queries and execute them. Some more advanced material, subtleties and corner cases have been intentionally left out, to keep the tutorial of a reasonably small size and introductory level. However, most core query-building machinery is covered and illustrated with examples, which should be sufficient to get started.

The Role of Relational Database Connectivity

Relational databases are an important and widely used means for information storage, processing and lookup. The associated technology has become very mature over the last decades, and modern relational database backends have exceptional capabilities for storing and processing structured data, from small to very large data volumes. It is important, therefore, to be able to seamlessly integrate relational databases into one's workflows, for doing work in a number of areas, from data science to web applications.

One important aspect of this is the ability to process even extremely large datasets on the database side, "out of core". Most common relational database backends have been heavily optimized and apply cutting-edge technology to be able to process such large datasets very efficiently.

In the context of the Wolfram Language, the ability to offload the computations on large datasets to a database engine is particularly important, since many such computations do not require the full computational power of the Wolfram Language, while the size of the resulting dataset can be considerably reduced, so that it can be further processed with ease within the Wolfram Language.

Relational Database Connectivity in the Wolfram Language

General considerations

Direct integration of relational databases into one's work may get complicated for a number of reasons. First, many domain experts do not have the knowledge of SQL that is required for direct use of relational databases. Second, there are (often subtle) differences between SQL dialects and supported feature sets for different database backends in common use. Third, one has to build data pipelines manually, connecting the database to one's programming environment. For these reasons, tight integration of relational databases directly into the language is very beneficial. It allows the users to focus on solving their problems rather than on providing the technical infrastructure.

The very high-level symbolic nature of the Wolfram Language sets stiff requirements for the framework that integrates relational databases into the language. It has to be high level, symbolic and idiomatic to use, yet exposing enough structure to not limit the query building and other capabilities too much with respect to raw SQL. In particular, idiomatic Wolfram Language programming gravitates toward functional programming style, function composition and immutabilitythe features that the connectivity framework is therefore required to embrace and support.

The role of the Entity framework

As it turns out, the Entity framework does largely satisfy the requirements outlined. While there are no objects in the OOP sense in the Wolfram Language, the Entity framework for the Wolfram Language is, in the context of relational databases, in many ways similar to what object-relational mappers (ORMs) are for other languages. Somewhat oversimplifying, entities map to single rows of database tables, entity classes map to database tables (existing or virtual), and entity properties map to database columns.

As a result, for the vast number of cases when one needs to query a database, it is possible to stay within the Wolfram Language and communicate with the database via high-level Entity framework queries, constructed using Entity framework functional query language. The framework is then responsible for translating the query into an appropriate SQL dialect, executing it on the database side and returning the results back to the user in the appropriate format.

Entity framework and DatabaseLink

For a long time, the main tool to interact with relational databases in the Wolfram Language has been DatabaseLink. It is built on top of J/Link and JDBC and provides a full-featured toolbox to interact with relational databases.

It is important to understand the differences between the DatabaseLink toolbox and the technology discussed in this tutorial. The most important differences are in the the level of abstraction offered by either one of these tools, and the supported features for interacting with databases.

The toolbox provided by DatabaseLink is rather low level: while it has some features that allow the database queries to be constructed symbolically in simple cases, the majority of real-world queries would have to be written as SQL strings when used with DatabaseLink. This has a number of implications for the user, from the need for familiarity with SQL (and a particular SQL dialect that corresponds to a specific database backend used) to the inability to use the strong points of the Wolfram Language in terms of its symbolic capabilities and high-level abstractions. All this leads to an incomplete, low-level integration of the databases into the Wolfram Language.

The technology based on the Entity framework has been designed from the ground up to provide a high-level seamless integration of relational databases into the Wolfram Language and contains features (such as symbolic query language, relations, automatic SQL generation and backend specialization, built-in type-checking, etc.) that enable considerably more powerful and high-level workflows involving relational databases in the Wolfram Language.

On the other hand, DatabaseLink supports a number of core features that are required for common workflows involving databases, such as write operations (SQL INSERT / UPDATE / DELETE), transaction support, control over timeouts, connection pools, etc. It also provides low-level access to database result sets for efficiency purposes. All of these the Entity framework-based technology is currently lacking, so one can say that DatabaseLink is at present more feature-complete.

There is currently no deep interoperability between the two technologies, other than the fact that both are integrated into the Wolfram Language. The level of interoperability may become better in future versions of the Wolfram Language.

The Organization of the Tutorial

A brief breakdown of the material is as follows.

The first section explains the standard workflow to prepare for working with relational databases: establishing a connection, creating and registering a database-backed EntityStore object.

The second section reviews and illustrates key features of the Entity framework in the context of relational databases.

The third section goes through core query-building blocks that the Entity framework provides to build more complex queries.

The fourth section contains a brief coverage of some additional useful constructs and tools, which may be considered somewhat more advanced, but are extremely useful in practice.

The fifth section illustrates what kind of SQL can be typically generated for various query-building primitives, supported by the Entity framework.

The sixth section describes how one can programmatically generate queries and illustrates with examples why this may be beneficial.

The seventh section describes a few practical techniques of query construction.

The eighth section contains a brief coverage of error handling and typical errors one may encounter when constructing and running queries.

The final section contains several examples of more complex queries, illustrating how one can use different query building blocks together to construct more sophisticated queries.

The Sample Database

Examples in this tutorial are based on sample public domain database Classic Models (in particular its SQLite version), which is a retailer of scale models of classic cars database.

The database consists of eight tables:

Connecting to a Relational Database

Connecting to the Database

To start working with a relational database, one has first to go through the following steps:

Working with RelationalDatabase Object

As described in the previous section, one of the steps required to create the database-backed EntityStore consists of creating the RelationalDatabase object. However, this object is also useful in its own right. It contains information about database schema (tables, columns, constraints, etc.) and can be used to both visually inspect and programmatically extract parts of this information that one is interested in.

This creates a RelationalDatabase object:
Click for copyable input

The formatting for the RelationalDatabase object makes it easy to visually inspect the database structure, using hierarchical group openers, which can be expanded to inspect more details about a given table or column:

Click for copyable input

One can also extract this information programmatically.

The following lists all table names for a given RelationalDatabase object:
Click for copyable input
The following tests if a table with a given name exists:
Click for copyable input
The following lists all column names for a given table:
Click for copyable input
The following tests if a column with a given name exists:
Click for copyable input
The following lists all properties for a given table column that RelationalDatabase object is aware of:
Click for copyable input
One can get their values as follows:
Click for copyable input

For more details about the various properties and methods, consult the reference page for RelationalDatabase object.

Entity Framework and Relational Databases


This section covers important core operations supported by the Entity framework, in the way they are used in the context of relational databases. It is not primarily focused on query construction, which has a separate section dedicated to it. Rather, it lays the groundwork and discusses other important aspects, many of which are prerequisites for effective work with relational databases through the Entity framework.

To use the Entity framework effectively in one's work with relational databases, it is important to have at least a basic understanding of how the concepts and constructs of the Entity framework correspond to the core concepts and constructs of relational databases. This topic is discussed first.

Query execution and the so-called "resolvers" (functions that are used to actually transform symbolically represented queries into specific results by initiating the process of query compilation and execution and returning the results) are discussed next.

Next, computed properties and EntityFunction are covered. These are very important building blocks that allow one to define and compute new properties on the fly, where such new properties may require complex computations performed on the database side.

In a number of cases, being able to work with single entities is important. This can be useful for various purposes, from better visualization and understanding of the results to query construction (for debugging and prototyping purposes, it is very valuable to be able to execute parts of the query on a single entity). A brief account of this topic is given next.

For relational databases, the notion of a primary key for a database table is a central one. The Entity framework counterpart for this is the CanonicalName property for single entities. The other part that, together with the canonical name, defines a single entity, is a type of that entity. These topics are considered important enough to merit a separate section.

While most modern workflows involving relational databases deal with database tables that do have a primary key, cases when a table either does not have one, or when it does but that is not enforced on the database schema level, are not unheard of. These are even more practically important in the context of the Entity framework, since for such tables/entity types, only a part of the Entity framework's functionality will work. This issue is discussed next.

While this tutorial does not focus on types and the type system implicitly involved in the interaction with relational databases, there is one type distinction that is very important to understand for effective work. Namely, entity types may contain properties that are entity valued and entity-class valued. In this tutorial, such properties are called relations and are added by the framework to the set of core properties (properties directly corresponding to database table columns) for each entity type.

The last topic covered in this section concerns missing values and ways in which one may encounter those in results, in the context of relational databases.

Approximate Mapping between the Entity Framework and SQL

Since the Entity framework is used in the Wolfram Languagein particular, to represent relational databasesthere clearly has to be a mapping for the core concepts and constructs between the Entity framework and relational databases/SQL.

For many features of the Entity framework, such a mapping is rather direct. In some ways, however, the Entity framework represents a richer data model. For example, for the in-memory use of the Entity framework, entity properties can have arbitrary Wolfram Language expressions as their valueswhich, if used directly, would not comply even with the first normal form for relational databases (e.g. when entity properties are List valued). As another example, the Entity framework is capable of representing types with an infinite number of entities, something that relational databases cannot easily do. Also, the set of easily possible computations is more limited for relational databases w.r.t. the Wolfram Language, the latter naturally having much richer computational functionality available "out of the box".

Restrictions of the relational model have their advantages as well, such as the ability to provide strong guarantees for data consistency, transactions and other related useful features of ACID-compliant databases. These restrictions also mean similar restrictions imposed on the Entity framework for database-backed entity stores.

That said, the mapping between the core relational and Entity framework constructs is relatively straightforward. It is summarized by the following table.

Database (schema)EntityStoreA collection of entity types (database tables)
Database tableEntity typeA (handle to a) collection of entities with a similar set of properties (table rows)
Database table rowEntity (single entity)A (handle to a) set of properties/values, representing a single "thing" (table row) with a unique identity
Database table field (column name)EntityPropertySpecific property, usually having a specific type
Primary keyCanonicalNameA property or set of properties, guaranteed to be unique (when taken together for a set of properties case) for a given entity type (db table)
Foreign keyEntity-valued propertyEntity property whose value is a single entity (can be of the same or different entity type). For databases, a field that points to a unique row in a table (can be the same or different table).
(Derived) tableEntityClassA (handle to a) collecttion of entities of the same type (either registered or defined by the query). On the database side, the derived table is a virtual table/subquery, used in a FROM clause.

As explained previously, this mapping has a certain asymmetry. It makes it straightforward to represent existing relational databases as entity stores. However, not all existing in-memory entity stores can be easily mapped to relational databases without creating extra abstraction layers. In particular, the following are some of the restrictions that exist:

Executing Queries with EntityValue and EntityList

There are just two "resolvers" (commands that can be used to execute a query) in the Entity framework: EntityValue and EntityList.

The main and most commonly used one is EntityValue. It is typically used to execute a given query and get the results in a variety of forms. In the context of relational databases, queries typically represent (virtual) database tables, which in the Entity framework correspond to (virtual) entity types. The role of EntityValue in this context is to compile a given query to the suitable SQL dialect, execute it on the database, and extract values for a certain set of entity properties (resulting table columns in database terms) in various forms (lists or associations of values).

Following are a few simple examples of use of EntityValue.

The following extracts values for several properties of registered type "offices":
Click for copyable input
The result can be obtained in a different form, for example, as a list of associations, with property names retained:
Click for copyable input
For extracted properties, one can use the full EntityProperty form in place of short string names, in which case the resulting keys in the data association will also be EntityProperty[] expressions:
Click for copyable input
One can also obtain the results in the form of Dataset:
Click for copyable input

The full set of possible modifiers for the third argument of EntityValue can be found in the documentation for EntityValue.

In some cases, one may want to obtain a list of entities contained in a given entity type/class. This can be accomplished with EntityList.

The following code returns a list of entities contained in the "offices" entity type:
Click for copyable input

Note that, with an exception of certain special cases to be discussed in subsequent sections, the ordering of entities in the result is not guaranteed to be the same for repeated calls to EntityList or EntityValue.

Computed Properties and EntityFunction

In addition to extracting existing properties, one can also extract computed propertiesthat is, properties created on the fly, which may require complex computations on the database side. Such properties must be expressed using EntityFunction. The semantics of EntityFunction in this context is similar to Function, with a few important differences to be covered in subsequent sections. One can think of EntityFunction as a construct similar to EntityProperty that (in the context of this section) takes a single entity and returns the result of some computation performed with that entity. It is important that at present EntityFunction can only return a scalar, but cannot return e.g. a list of values, an entity or an entity class.

Following is an example of the use of EntityFunction to retrieve a property that requires computation on the database side.

The following query extracts the property "officeCode" and computed property, representing a full string office address, for every office in the "offices" type:
Click for copyable input

EntityFunction has the HoldAll attribute, just like Function, to prevent premature evaluation of its variables and body. The body of EntityFunction is an expression, where one can use a limited set of primitives that EntityFunction can understand and compile to SQL. The full list of such primitives can be found in EntityFunction documentation, while many more examples of the practical use of EntityFunction can be found in subsequent sections.

Working with Single Entities

Single entities are an important concept and building block in the Entity framework. The ability to work with single entities adds interactivity to one's workflow and often allows one to get a better understanding of the data, even in cases when the end result should be a query that works on many entities (entity class).

It is important to understand that single entities are essentially handles (or references) to the actual data. They do not contain data, except for their type and canonical name. As such, they are lazy: every time one needs to extract data from an entity, one has to run a new query against it. This lazy nature of entities is a very useful feature of the Entity framework, since it allows one to work with entities in a more abstract fashion. However, it comes with a few caveats that one should be aware of. For example, should some properties of a particular entity change in between two successive executions of the same query, the returned results will generally be different too, reflecting those changes. It also may happen that an entity that was there in a given entity class at some point has been deleted and no longer exists. Of course, these complications can only occur for the data that is changing in time.

Following are a few examples of how one can query single entities.

Consider a specific single entity:
Click for copyable input
Property values can be extracted using EntityValue:
Click for copyable input
One can also use the special lookup syntax:
Click for copyable input
As for entity classes, one can use modifiers to control the shape of the result:
Click for copyable input

It is important to realize that one can extract computed properties for a single entity, using EntityFunction. This can be very useful for both quick lookups and prototyping more complex queries.

For example, this computes the formatted string out of the "city" and "state" properties for a given office, on the database side:
Click for copyable input
The same can also be accomplished as:
Click for copyable input

The last example also illustrates the property-like nature of EntityFunction.

Entity Types, Uniqueness of Single Entities and CanonicalName

In the Entity framework, each entity must be unique within its entity type/entity class. The other way to say it is that each entity must have a unique identity, and no entity class/entity type is allowed to contain duplicate entities.

Entity type and canonical name

Syntactically, an entity is represented by an expression containing two parts: entity type and entity unique identifier, that is called canonical name.

Looking at the InputForm of entities:
Click for copyable input

One can extract the canonical name of an entity also using CanonicalName.

The following extracts the canonical name of a single entity:
Click for copyable input
The CanonicalName function also works on a list of entities:
Click for copyable input

The canonical name is typically a number, a string or a list of these.

Entities of type "orderdetails" have pairs of integer order number and string product code as their canonical names:
Click for copyable input

The entity type of an entity is always the first argument of Entity[], but not always a string. In particular, in cases when the type is not the registered type, but a "run-time" type, implicitly defined by an Entity framework query, it will be that query itself.

The following query defines a complex type:
Click for copyable input
Take a look at a single entity of that type:
Click for copyable input
The type of this entity is defined by a query:
Click for copyable input

For entity types that correspond to the database tables in the database, the canonical name for an entity is exactly the primary key for the corresponding database table.

When can the canonical name be a list?

In the context of relational databases, the canonical name for an entity can be a list in one of the following cases (or their combination):

Single Entities and Database Tables with No Primary Key

The case when certain database tables do not have the primary key constraints in the database schema might not be typical but is important in practice. Note that not in all such cases is a column with unique values actually absent from the table. However, at present there is no way to indicate such a column for the Entity framework to use as a primary key for such tables: all the information about the primary keys is currently obtained at database inspection time (when the RelationalDatabase object is constructed) and entirely from the existing database constraints.

What this means for the Entity-framework-based workflows is that for such tables, single entities can not be defined, since there is no clear way to attach a canonical name to a database row in such a table. This does not mean that one cannot do anything useful with such tables, but certain parts of the Entity framework's functionality will not work with those.

To illustrate the limitations existing for such tables in terms of Entity framework functionality, the Titanic database will be used, which is an SQLite database hosted in the Wolfram Cloud and based on the following Titanic dataset:

Click for copyable input

As can be seen from the structure of this dataset, there is no field or combination of fields there that would naturally work as a primary key. There is even no guarantee that there are no duplicate records in the dataset, since there are not even passenger names present in the data, but just the passenger's travel class, age, sex and whether or not they survived. Still, this dataset is a useful source of information.

A sample workflow illustrates some of the Entity framework operations that are and are not possible for such datasets.

The first step is to create a RelationalDatabase object:

Click for copyable input

There should be no problems with this step; the database and a single table it has are correctly identified.

The next step is to create and register the EntityStore object based on this database:

Click for copyable input

As can be seen, a warning is issued by EntityStore that single entities cannot work for this type/database table. In particular, EntityList and some forms of EntityValue will not work.

EntityList does not work for types without a CanonicalName (tables without a primary key):
Click for copyable input
EntityValue for certain modifiers does not work either:
Click for copyable input

However, other forms, not involving single entities, do work.

It is still possible to extract values for specific properties, as long as the form of the resulting data does not involve single entities:
Click for copyable input

And so do many useful queries.

The following query selects all surviving passengers in first class with age <= 15:
Click for copyable input
The following query introduces a coarse-grained age class for passengers, where passengers from 0 to 20 years old are in the first group, from 20 to 40 years old in the second, etc. It then computes the fraction of surviving passengers for each passenger class, age group and sex, and sorts the results by that fraction in descending order:
Click for copyable input

To summarize this section:

Entity-Valued and Entity-Class-Valued Properties

Properties can be entity valued or entity-class valued. Such properties do not correspond to any of the database table columns of the corresponding database table, but have been added by the framework and represent related tables.

Following are a few examples of such properties for the "employees" entity type.

The following gets a value of the "employees" property, which is an (implicit) entity class:
Click for copyable input
One can expand this entity class into a list of entities by using EntityList:
Click for copyable input

For an example of an entity-valued property, consider one of the employees.

This picks the first employee for a given office and extracts entity properties and values for that employee:
Click for copyable input
In addition to the "officeCode" property, which in database terms is the foreign key to the "offices" table, there is a generated property "offices" whose value is the entity representing the office this employee works for:
Click for copyable input

Note that such entity- and entity-class-valued properties have no direct database column origin, but have been generated by the framework based on the database schema (relations/foreign keys information).

One important observation to be made at this point is that entity-class-valued properties are not automatically resolved to lists of entities, so that EntityList has to be applied explicitly to their values if one wants to get the list of entities as a result.

Generated properties are further discussed in the section on relations.

Missing Values in Results and Invalid Property Names

For some entities, certain properties may not have values in the database table (values may be NULL). On the Wolfram Language side, this corresponds to Missing[] values.

An example will illustrate this.

Consider one of the offices again:
Click for copyable input
This office is in Paris, France, and thus has no value for the "state" field. Missing values like this correspond to NULL values on the database side and are represented by Missing["NotAvailable",...] in the result:
Click for copyable input

Another type of input that results in Missing[] values is request values for invalid property names. In this case, however, the value will have a different reason for being missing: Missing["UnknownProperty",...].

Following is an example where values for nonexistent property "foo" are requested:
Click for copyable input

Core Query Building Blocks


This section describes the core primitives that the Entity framework provides to build more complex queries.

First, property-like queries are covered. They are explained and illustrated with a number of examples concerning how one can significantly expand the set of available entity properties with computed properties, and how EntityFunction is used to express those computed properties.

The rest of this section is dedicated to various primitives that take entity classes (and other parameters) and return new (transformed) entity classes. The Entity framework provides such primitives for the following typical data-processing operations:

EntityFunction and Property-like Queries

Property-like queries are Wolfram Language expressions used to build parts of Entity framework queries, which get compiled into SQL expressions in the process of query compilation and execution. The main construct used to construct property-like queries is EntityFunction.

Property-like queries are useful in a number of places, such as defining computed properties, predicates for filtering data, sorting criteria, aggregation and more. In this tutorial, some typical examples of them are considered.

One can use standard arithmetic operations: Times, Plus, Subtract, Divide, Mod, etc. on numeric quantities.

The following computes the total amount paid for each ordered item, taking into account ordered items' quantities:
Click for copyable input

A number of string manipulation functions are supported as well.

The following extracts the value of the "state" property for each entity in the "offices" type and also computes the Boolean expression, which checks whether this property value starts with the letter "C":
Click for copyable input

To test for missing values, the MissingQ predicate can be used.

The following example shows how one can use the MissingQ predicate to check for missing valuesin this case, for the property "state" for the type "offices":
Click for copyable input

Standard comparison operators are also supported. Note that Equal and SameQ can be used interchangeably, as well as Unequal and UnsameQ.

The following example shows the use of comparison operators Equal, SameQ, Unequal, UnsameQ in the context of string comparison:
Click for copyable input

One can construct fairly complex expressions of numeric and other types , which will be translated into SQL and executed on the database side.

The following query computes the ratio of the difference between MSRP and buy price, to the MSRP price, which is the percentage of income the store gets from selling the item, if it sells it under MSRP price:
Click for copyable input
Here is a more complicated metric, also involving Power and Sqrt, just for an illustration:
Click for copyable input

Boolean expressions are particularly important, since Boolean-valued EntityFunction expressions are frequently used as filtering predicates, conditions for CombinedEntityClass, etc.

The following query computes a Boolean property that yields True for items ordered in quantity >30 and with price per item >= $200, for each entity of the "orderdetails" type, where Wolfram Language-side post-processing selects only those products for which it yields True:
Click for copyable input
The following query computes a property that is only True for order numbers divisible by 100:
Click for copyable input
The following query computes a Boolean property that yields True for all offices for which either their city contains letter "a", or their state exists (in not Missing[]) and their city contains letter "o":
Click for copyable input
The following query computes a Boolean property that yields True for all orders that are required to be shipped within eight days from the date of placing the order:
Click for copyable input
In the following example, a Boolean-valued query expression is used as a filtering criteria inside FilteredEntityClass, to find all orders where an item was ordered with quantity >30:
Click for copyable input

A Note on Entity Class Transformers

The operations described in the sections that follow, namely FilteredEntityClass, SortedEntityClass, SampledEntityClass, ExtendedEntityClass, AggregatedEntityClass and CombinedEntityClass, can all be called entity class transformers. They all take an entity class (or two in the case of CombinedEntityClass) and return a new entity class.

It is important to realize that these constructs are completely symbolic and inert. No actual work is performed when one of these is applied to an entity class argument. The resulting query stays symbolic, and one needs to call one of the resolver functions (EntityValue or EntityList) to actually perform the query.

Following is an example of a query containing several of these transformers, nested within one another.

This selects the five top-paying customers based on their past payments and lists them in the order of decreasing total amount paid:
Click for copyable input

When evaluated, the symbolic Entity framework query evaluates to itself, staying as an inert symbolic expression. One can still do a few useful things with it though, such as, for example, finding all entity properties for the entity class that this query represents (this does not require a call to the database).

The following returns entity properties for the entity class defined by the query:
Click for copyable input

One uses EntityValue to actually execute the query.

One can execute this query by using EntityValue:
Click for copyable input

The symbolic and inert nature of Entity framework queries opens the way for programmatic query construction, since queries on their own are inert Wolfram Language expressions and can be constructed from smaller building blocks either manually or programmatically.

Filtering with FilteredEntityClass

One of the most commonly needed operations is to filter database data according to certain criteria. In Entity framework, FilteredEntityClass is used for that purpose, with the filtering criteria expressed using EntityFunction and passed to FilteredEntityClass as a second argument.

The following examples illustrate typical uses of FilteredEntityClass.

The following lists first name, last name and job title for all employees with a job title other than "Sales Rep":
Click for copyable input
The following query selects all employees with a short email name (<= 5 characters in length):
Click for copyable input
The following query finds all employees whose name starts with one of "M", "P", "D":
Click for copyable input
The same result could also have been obtained with the following query:
Click for copyable input

Sorting with SortedEntityClass

Sorting entities (rows in the database table) according to some criteria is another commonly used operation. On the database side, this is achieved by using ORDER BY clause of SQL. On the Entity framework side, SortedEntityClass can be used to achieve that.

In the simplest case, one needs to sort with respect to the values of a single existing entity property (database column), in an ascending order. In this case, the string field name is passed to SortedEntityClass as a second argument.

This is illustrated with the following example.

The following sorts employees with respect to the office code:
Click for copyable input

In case descending order of results is required, one can use the fieldName -> "Descending" syntax, as shown following.

The following query sorts employees with respect to the office code, in descending order:
Click for copyable input

It is also possible to sort by a property-like query.

The following query lists employee number, first name and last name for all employees, sorted by the length of the first name, in an ascending order:
Click for copyable input

One can sort by more than one property. In this case, values of the second property are used as tie-breakers for groups of items with identical value of the first property, and the same logic extends to the third etc. sorting property. One can attach the "Ascending" or "Descending" qualifiers for each sorting property separately, to control the sorting order within the (sub)groups of entities.

The following query sorts products by product line and then by quantity in stock in descending order:
Click for copyable input

One can limit the number of results returned by SortedEntityClass, by utilizing its third argument.

Here is one of the previous queries, listing information about employees sorted by the length of their first name, where the result is now limited to the first seven records:
Click for copyable input

Subsetting with SampledEntityClass

In some cases, it is useful to pick only a certain number of entities from an entity class (or, in database terms, pick only a certain subset of rows from a table or virtual table). The way to do this in the Entity framework is to use SampledEntityClass.

The following examples illustrate its typical usage.

The following query picks 10 payments from the "payments" type (table):
Click for copyable input
The following does the same but skips the first 10 entries:
Click for copyable input

Note that the ordering of results can only be guaranteed if SampledEntityClass is applied to SortedEntityClass. However, even in other cases it can still be quite useful, for example, when one wants to peek into a large dataset or prototype a query using just a small sample first.

The following query returns the names and prices for the top five most expensive products:
Click for copyable input
In this case, the order is guaranteed by the fact that the dataset has been sorted before subsetting. This same result can be obtained more economically with SortedEntityClass with three arguments:
Click for copyable input

Introducing New Properties with ExtendedEntityClass

It is possible to add new, computed properties to the set of available entity properties for a given type/entity class. The result is a new entity class. The newly added property can be used anywhere the original properties canin EntityValue, further in the outer layers of the query, etc. The construct that creates such a new extended entity class is ExtendedEntityClass.

The following example illustrates a simple use case where one new property is added to an entity class.

The following query adds a new property "fullName" for all employees:
Click for copyable input

One can add more than one property, in which case one should use a list.

The following uses relations (described in more detail here, the property "employees-reportsTo") to add the first and last name of the manager, for employees who have one:
Click for copyable input

The new property can use various constructs, including conditional logic etc. This capability allows nontrivial computations.

The following query adds a new, adjusted credit limit for the customers by incrementing the limit for those customers who have the current credit limit >=$100,000, by $15000:
Click for copyable input

Aggregation with AggregatedEntityClass

Aggregation is a very commonly needed operation. It is used to compute values, aggregated over more than one entity (table row in the database). Typical examples of aggregation include computing totals, averages, or minimal and maximal values of some properties or more complex expressions, over a range of entities. Within the Entity framework, the construct that can be used to perform aggregation is AggregatedEntityClass.

As its name indicates, the result of its application is a new entity class. If the aggregation is performed over the entire original entity class, the resulting entity class will contain just a single entity, having as its properties the computed aggregated properties. One can also first group entities based on values of a certain set of their properties, and then perform aggregation over each such groupin which case the new entity class will contain as many such new aggregated entities as there are such groups.

At present, there is a limited and small set of core aggregation functions that are supported by the Entity framework for database-backed entity stores: Total, Length, Min, Max, Mean, Variance and StandardDeviation. One can use standard arithmetic functions and other property-like query building blocks, however, to combine these core operations and compute more complex expressions.

It is important to note that EntityFunction for the case of aggregation is bound to an entire entity class being aggregated rather than individual entities of that class.

Following is a simple example of use of AggregatedEntityClass to compute a single aggregate value.

The following query computes the total number of all ordered items:
Click for copyable input

To understand the semantics of EntityFunction in this case, it is useful to compute the same thing on the top level, using the Wolfram Language.

This code computes the same quantity using the Wolfram Language for aggregation, where Function is used to make the analogy most apparent:
Click for copyable input

There are more concise ways to compute aggregated properties.

One can use a simpler syntax, which uses the same name for the resulting aggregated property as the name for the property being aggregated. This works for a simple case when there is a single property being aggregated over in each case, but one can have more than one aggregation:
Click for copyable input
And a yet more concise form exists for the case when also there is a single aggregation (returning a scalar rather than a list, however):
Click for copyable input

An important remark on the last example is that for the current version of the Entity framework, this three-argument form of EntityValue, when used for aggregation in a top-level query, will perform the aggregation on the Wolfram Language side rather than the database side (this does not happen when this syntax is used in subqueries). In contrast, all aggregations performed with an explicit use of AggregatedEntityClass will always be performed on the database side.

It is possible to compute more than one aggregated property.

The following query defines an aggregated entity class with three aggregated properties: maximal, minimal and average quantity of ordered items for all orders split into single-item orders (which is what the "orderdetails" table provides):
Click for copyable input

One can compute aggregations over certain groups of entities rather than the entire entity class. In this case, the third argument of AggregatedEntityClass has to be used, to indicate a property or a list of properties whose values will define unique groups of entities for aggregation.

The following examples illustrate typical use cases of this kind.

The following computes the average buy price for products within each product line:
Click for copyable input
The following computes how many employees each office has:
Click for copyable input

It is possible to group entities by using not a single property, but a set of properties.

The following query computes the total number of customers coming from the same city and country, sorting by the customer number in descending order:
Click for copyable input
In this case, aggregation is performed over groups of entities having the same combination of city and country.

One can use more complex expressions than just a single property inside aggregation functions.

The following query computes the total amount to be paid for all placed orders (semantically, in this case, both o["priceEach"] and o["quantityOrdered"] represent columns in the database. Their multiplication is allowed, since they belong to the same table (and thus have the same length) and must be understood as vectorized element-wise operations. The result of this multiplication is semantically another column, which is passed to Total):
Click for copyable input
The following query computes the average amount per ordered item in two different ways: by explicit computation and by using the built-in function (Mean):
Click for copyable input

Combining Entity Classes with CombinedEntityClass

In relational databases, data is typically stored in several tables, connected to each other via foreign key constraints. Only in rare cases does one work with just a single table for the real-world queries one typically wants to write, and considering that the data, being normalized, is typically split and stored in a (possibly large) number of related tables.

The most common way to use data from more than one table in an SQL query is to use SQL JOIN operations. The Entity framework counterpart of SQL JOIN is the CombinedEntityClass construct. It takes two entity classes/types, the specification that tells how they are going to be combined, and optionally the type of JOIN. The result is a new entity class, with new entities containing properties from both of the original entity classes.

The following examples illustrate typical use cases for CombinedEntityClass.

The following query combines entity classes "employees" and "offices", via the "officeCode" property, and then extracts properties "employeeNumber", "firstName", "lastName", "city" and "country", where the former three belong to the type "employees" and the latter two to the type "offices":
Click for copyable input
This can become more apparent if one looks at the full list of properties for the new class (it is more instructive to actually look at their InputForm):
Click for copyable input
The entities of the new class are of a new type, which can be seen by e.g. using EntityList:
Click for copyable input

The new entity classes resulting from application of CombinedEntityClass can be further used in more complex queries, just like any other entity classes.

The following query finds all employees who work in France or the UK and lists their employee numbers, first and last names and the country:
Click for copyable input
The following returns, for customers with a credit limit larger than $120,000, the payment information including customer number, name, amount paid and the date of payment (where the Wolfram Language post-processing step is required since dates are stored in Unix time on the database side):
Click for copyable input

Note that in the last example it was necessary to explicitly indicate EntityProperty["customers","customerNumber"] in the list of properties in EntityValue, since both types "customers" and "payments" have the property with this name, and one has to disambiguate exactly which property is requested.

The following displays customer name and the first and last name of their sales representative, for customers from France:
Click for copyable input

It is possible and often desirable to combine a type with itself. In the context of relational database operations, this case corresponds to a self-join. In this case, more care is needed to disambiguate property namesone has to introduce a (string) alias for at least one of the same types being combined. This case is illustrated with the following example.

The following query lists first and last names of employees, together with the names of their managers. The "manager" alias for the second "employees" type is introduced, which is then used to disambiguate the entity properties:
Click for copyable input

In some cases, the condition that is tested to decide whether a combination of the two entities from two entity classes being combined should be included into the resulting entity class is more complex than a simple equality between some (combinations of) properties of both entities. For such cases, one can use EntityFunction with two arguments, each one bound to an entity of the corresponding entity class being combined. The body of such an EntityFunction must return a Boolean value, but otherwise can be an arbitrarily complicated (compilable) expression.

The following example illustrates such a case.

The following is a rather interesting example, since it illustrates several different aspects of CombinedEntityClass usage. This query represents one way of finding all countries that have more than one sales office. The idea is to combine the "offices" type with itself, and as a condition for CombinedEntityClass use the predicate that checks that the countries of two entities being combined are the same, while office codes are different. Any such combination of two "office" type entities does correspond to a country that has more than one office. So it only remains to extract the country names of the resulting combined entities and remove possible duplicates:
Click for copyable input

In the last example, it is also necessary to prefix one of the combined "offices" types, to disambiguate properties, just like it was in the previous example of self-join. Finally, DeleteDuplicates is used as a third argument of EntityValue to remove the duplicates from the results (and as was noted before, it is important to remember that in cases like this, the third argument of EntityValue is currently executed on the Wolfram Language side, rather than on the database).

Other Tools


Subqueries are queries that are used as building blocks in more complex queries. They take the simplest form when the result of a subquery is a scalar. In this section, several examples of subqueries are used to illustrate their general uses.

As a starting point, consider a common task of obtaining a single aggregated property from some entity class. This will not yet be a subquery. The following examples illustrate various ways to do this.

The following aggregation query returns the maximal value of the "MSRP" property for all the products:
Click for copyable input
One can get the same result using the following alternative syntaxes:
Click for copyable input

It is important to note, however, that while the results are the same, the first (shorter) version does in fact perform the aggregation in the Wolfram Language, while the last two versions aggregate on the database side.

However, when such queries are used as subqueries in larger queries, in that case they are always executed on the database side, including the short three-argument form of EntityValue.

The following is a simple example of a query containing a subquery.

The following query uses the previous query as a subquery. It selects all products with the MSRP price within the top 10% of most expensive products, in terms of the price range:
Click for copyable input

Subqueries like the one in the last example are called uncorrelated, since they do not refer to entities (table rows) of the outer layers of the query. One can also build so-called correlated subqueries, which do contain such references (and therefore, cannot be executed on their own without modifications).

The following example illustrates the steps one typically goes through to construct queries involving more complex and/or correlated subqueries.

The following query computes the total number of employees for an office with the office code "4":
Click for copyable input
One could also do the same computation with the office's entity, where the hard-coded "4" has been removed from the condition, and now there are nested EntityFunction expressions, with the inner one referencing the outer one's variable:
Click for copyable input
It is now easy to understand the following code, which computes the number of employees for each office and sorts the offices w.r.t. their number of employees in descending order:
Click for copyable input

The last example is an example of a correlated subquery. The correlation is reflected by the inner EntityFunction's body referencing the outer EntityFunction's variable.

Note that in many cases, it is straightforward to replace correlated subqueries with joins.

The join version of the last example query is much simpler:
Click for copyable input

Subqueries are a powerful tool, but they can be rather easily abused. Depending on the case, they may or may not be the optimal way to solve a given problem. Deeply correlated subqueries might result in an inferior performance, so one should use this powerful tool carefully.


The MemberQ predicate in the context of database-backed entity stores is used in Entity framework queries to expose the SQL IN operator. In the simplest form, it is used to test membership of a value in an explicit list of values.

The following example illustrates such use cases.

The following property defined by EntityFunction yields True for an office with the office code being one of "2", "4" or "7", and False otherwise:
Click for copyable input

One can also use MemberQ in conjunction with subqueries, in which case, the first argument of MemberQ is not a literal list, but a result of a subquery. In such cases, a subquery should return a column rather than a scalar, and typically can be written as class["property"] or, equivalently, EntityValue[class,"property"], where class is some entity class (registered or defined by an inner query).

The following example illustrates such use cases.

The following query selects customers who live in some of the cities where the offices are located, within the US:
Click for copyable input

DeleteDuplicates and SQL DISTINCT

In some cases, one needs to only keep those selected values or groups of values that are distinct. In SQL, a special DISTINCT keyword is used for this purpose. In the Entity framework, one can use DeleteDuplicates as a third argument of EntityValue to achieve a similar effect.

The following query returns a list of all distinct values for the "status" property of the type "orders":
Click for copyable input

In the previous example, as was noted before, DeleteDuplicates is currently executed on the Wolfram Language side. However, when a query such as the preceding is used as a subquery, the removal of duplicates will happen on the database side.

A somewhat different use case is when one needs to aggregate over distinct values. Also in this case, one can use DeleteDuplicates inside EntityFunction to achieve that.

The following computes the number of distinct exact prices, as well as rounded prices, and also average distinct price and average rounded price (ignoring the price multiplicities):
Click for copyable input


In this tutorial, entity-valued and entity-class-valued entity properties, generated by the Entity framework for related entity types (database tables), are called relations.

In addition to the properties that correspond to existing table columns, new properties are created by the Entity framework for tables that are related to other tables. They constitute the mechanism to provide a higher-level (w.r.t. explicit joins or subqueries) way to use data from multiple related entity types (database tables) in queries.

Such properties have been already considered previously, but on the most basic level.

It is important to realize that there is nothing that can be done with relations that cannot be done with core primitives. However, using relations in many cases can lead to considerably more concise queries, which require a lot less effort to both construct and understand.

The examples below illustrate relations.

Consider properties for type "employees":
Click for copyable input

There are four properties that do not correspond to existing database columns in this example: "offices", "customers", "employees-reportsTo" and "employees-reverse".

One of the simplest ways to understand relations is in the context of single entities.

Consider some particular employee:
Click for copyable input
The properties in question correspond respectively to customers served by this employee, office for which this employee works, the manager of this employee and all employees for whom this employee is the manager:
Click for copyable input

Relations can be entity valued or entity-class valued.

The following query finds an entity class of all coworkers for the given employee:
Click for copyable input
The following is a more interesting query that returns all employees who work in the same office and report to the same manager as the given employee:
Click for copyable input

One can use relations to economically formulate queries. This is illustrated with the following examples.

Here is how one can use relations to compute the number of employees for each office:
Click for copyable input
If that property is needed further in the query, one can use ExtendedEntityClass to extend a given entity class with that property:
Click for copyable input

An important feature of relations is that one can follow them more than once, particularly for entity-valued relations.

The following query demonstrates the use of relations by adding to each order two new properties: the name of the customer who placed this order and the full name of the employee who served that customer. Note how relations are followed to extract the data belonging to related types (database tables):
Click for copyable input

One can use entity-class-valued relations in queries everywhere where entity class is expected.

The following example computes the total number of all customers and also the total number of customers with high credit limit (>$50000), for each employee, and adds these values as new properties. In this case, e["customers"] is an entity-class-valued relation, which can be used e.g. in FilteredEntityClass.
Click for copyable input

More examples of how relations can be used to build more complex queries are considered in the final section of this tutorial.

Entity Framework and SQL


The purpose of this section is to give a basic idea about the kind of SQL code that is typically generated by the framework for various supported constructs. The generated SQL provided here is mostly intended for illustration purposes.

While many of the SQL queries presented in this section actually correspond to the SQL code generated by the current version of the Entity framework query compiler (for the SQLite backend), one should not assume that the generated SQL will always take that exact form. It may change from version to version. From the viewpoint of the Entity framework, the exact form of generated SQL is an internal implementation detail, as long as the end result is correct and has reasonable efficiency. So one should not rely in any way on the details of generated SQL code shown in this section.

EntityValue and EntityList Calls

You can start with some basic EntityValue call:

Click for copyable input

This would correspond to a simple SQL SELECT statement like the following:

SELECT officeCode, city, state, country
FROM offices

In cases when one uses computed properties based on EntityFunction in EntityValue, they are aliased on the SQL side by the autogenerated property names.

For example, the following query:

Click for copyable input

Translates into SQL code like this:

SELECT REGEXP(state, '^C', 0) AS synthetic_prop_9
FROM offices

Such generated properties as "synthetic_prop_9" are not visible to the user, since EntityValue extracts only the property values when properties based on EntityFunction are requested.

For the case of EntityList, the database call actually extracts the values of the properties that are required to compute the CanonicalName for the entities of a given type (primary key for the corresponding database table).

For example, for the case of "offices" type:

Click for copyable input

That would be the "officeCode" property:

SELECT officeCode
FROM offices

Computed Properties and EntityFunction Expressions

Computed properties are defined using EntityFunction. These will typically compile into SQL expressions. Here are a few examples.

Consider the "orderdetails" type. The expression:

Click for copyable input

Will compile into SQL expression similar to:

orderdetails.priceEach * orderdetails.quantityOrdered

Consider the "offices" type. The following:

Click for copyable input

Will compile into:

offices.state IS NULL

Logical operators translate rather straightforwardly.

For example:

Click for copyable input

Will translate into something like this:

REGEXP(, 'a', 0) OR REGEXP(, 'o', 0) AND NOT (offices.state IS NULL)

In simple cases, arithmetic operations will compile into similar SQL-level arithmetic operations. E.g. for the "products" type:

Click for copyable input

Will compile into something like:

(products.MSRP - products.buyPrice) / products.MSRP

However, in some cases, the generated SQL code might involve type transformations/coercion.

For example, the following:

Click for copyable input

May compile into something like this (where conversions to reals are performed to preserve the semantics of Wolfram Language operations involved here, on the database side):

power(CAST((power(CAST(products.MSRP AS REAL), 2) - power(CAST(products.buyPrice AS REAL), 2)) / (power(CAST(products.MSRP AS REAL), 2) + power(CAST(products.buyPrice" AS REAL), 2)) AS REAL), 0.5)

Comparison operators translate straightforwardly.

For example, for the "orderdetails" type, the following:

Click for copyable input

Will translate into something like:

orderdetails.quantityOrdered > 30 AND orderdetails.priceEach >= 200

While e.g. this:

Click for copyable input

Will be translated into:

orderdetails.orderNumber % 100 = 0

The aggregation expressions are transformed similarly.

For example, the following:

Click for copyable input

Will get translated into:


While this:

Click for copyable input

Translates into something like:

sum(orderdetails.quantityOrdered * orderdetails.priceEach) / CAST(count(orderdetails.orderLineNumber) AS REAL)

More complex expressions get translated into the more complex SQL expressions.

For example, for the "employees" type, the following:

Click for copyable input

Will get translated into something like this:

    WHEN (1 > length(employees.firstName)) THEN NULL
    WHEN 1 THEN substr(employees.firstName, 1, 1)
END IN ('M', 'P', 'D')

Core Query-Building Primitives


This construct generally corresponds to the SQL WHERE clause, with the second argument (being the EntityFunction expression) corresponding to the WHERE clause SQL expression.

For example, this query:

Click for copyable input

Will translate into:

SELECT employees.firstName, employees.lastName, employees.jobTitle
FROM employees
WHERE employees.jobTitle != 'Sales Rep'

When it can, the query compiler tries to optimize the query. In particular, successively applying several conditions will not usually result in many nested levels of SELECT in the resulting SQL code.

For example, the following query uses three nested FilteredEntityClass constructs to find all customers from USA, CA, who are located in one of the cities: San Francisco, Los Angeles or San Jose:

Click for copyable input

The generated SQL will look something like this:

FROM customers
WHERE = 'USA' AND customers.state = 'CA' AND IN ('San Francisco', 'Los Angeles', 'San Jose')

Where all three different conditions have been collapsed into a single one.


This construct corresponds to the SQL ORDER BY clause.

For this example:

Click for copyable input

The resulting query may look like this:

SELECT employees.employeeNumber, employees.lastName, employees.officeCode 
FROM employees
ORDER BY employees.officeCode DESC

One notable difference between SQL and the Entity framework sorting facilities is that not all SQL backends directly support expressions in the ORDER BY clause, while for SortedEntityClass the properties with respect to which one sorts can be either simple entity properties or EntityFunction expressions.

For example, the following query sorts employees by the length of their first name:

Click for copyable input

And in this case, the generated query becomes somewhat more complex:

SELECT "T308"."employeeNumber", "T308"."firstName", "T308"."lastName" 
        "employees_T306"."employeeNumber" AS "employeeNumber",
        "employees_T306"."firstName" AS "firstName",
        "employees_T306"."lastName" AS "lastName",
        length("employees_T306"."firstName") AS synthetic_prop_17
    FROM employees AS "employees_T306"
) AS "T308"
ORDER BY "T308".synthetic_prop_17


The most direct SQL counterparts for this construct are LIMIT and OFFSET SQL keywords. However, the actual strategy and internal implementation for subsetting may be different, since in certain cases other strategies may be more efficient to obtain the same result.

For the following query:

Click for copyable input

The generated SQL may look like this:

    "payments_T316"."customerNumber" AS "customerNumber",
    "payments_T316".amount AS amount
FROM payments AS "payments_T316"


On the SQL side, this construct corresponds to the use of SQL expressions to define fields in the SELECT list, whenever one wants to compute properties more complex than just the original fields present in the database table or query.

Such expressions can be relatively simple, like in the case of the following query:

Click for copyable input

Which generates SQL similar to this:

    "employees_T328"."employeeNumber" AS "employeeNumber",     
    ("employees_T328"."firstName" || ' ') || "employees_T328"."lastName" AS "fullName"
FROM employees AS "employees_T328"

Or they can be complex and contain e.g. correlated subqueries like, for example, for the following query, which uses relations:

Click for copyable input

Where in the generated SQL, the last two fields in the SELECT list are represented by (correlated) scalar subqueries:

    "employees_T352"."employeeNumber" AS "employeeNumber",
    "employees_T352"."firstName" AS "firstName",
    "employees_T352"."lastName" AS "lastName",
        SELECT "employees_T355"."firstName" AS "firstName_1"
        FROM employees AS "employees_T355"
        WHERE "employees_T355"."employeeNumber" = "employees_T352"."reportsTo"
    ) AS "managerFirstName",
        SELECT "employees_T358"."lastName" AS "lastName_1"
        FROM employees AS "employees_T358"
        WHERE "employees_T358"."employeeNumber" = "employees_T352"."reportsTo"
    ) AS "managerLastName"
FROM employees AS "employees_T352"


When used without the third argument, AggregatedEntityClass represents aggregation performed over the entire first argument (entity class), and corresponds to SQL aggregate queries. In this case, there is typically no special SQL-side keyword that would correspond to the AggregatedEntityClass[...], but the fields in the SELECT list must all use the SQL aggregation functions.

The following query, for example:

Click for copyable input

Will translate into SQL similar to this:

    max("orderdetails_T403"."quantityOrdered") AS "maxOrdered",
    min("orderdetails_T403"."quantityOrdered") AS "minOrdered",
    avg("orderdetails_T403"."quantityOrdered") AS "avgOrdered"
FROM orderdetails AS "orderdetails_T403"

When the third argument of AggregatedEntityClass is used, this corresponds to the SQL GROUP BY clause.

For example, the query:

Click for copyable input

Will translate into something like this:

    "customers_T434".city AS city,
    "customers_T434".country AS country,
    count("customers_T434"."customerNumber") AS "customerCount"
FROM customers AS "customers_T434"
GROUP BY "customers_T434".city, "customers_T434".country

It is important to note that for the operations that one may want to further perform on the AggregatedEntityClass, often the generated SQL will have an extra layer of SELECT.

For example, should one wish to sort the entities resulting from the previous query by the value of the "customerCount" field:

Click for copyable input

The generated SQL query now contains an extra level of SELECT:

        "customers_T495".city AS city,
        "customers_T495".country AS country,
        count("customers_T495"."customerNumber") AS "customerCount"
    FROM customers AS "customers_T495"
    GROUP BY "customers_T495".city, "customers_T495".country
) AS "T497"
ORDER BY "T497"."customerCount" DESC


This construct corresponds to SQL JOIN.

Here is an example of a simple query of this type, which combines types "employees" and "offices":

Click for copyable input

Where the generated SQL may look like this:

    "employees_T529"."employeeNumber" AS "employeeNumber",
    "employees_T529"."firstName" AS "firstName",
    "employees_T529"."lastName" AS "lastName",
FROM employees AS "employees_T529"
        "offices_T532".city AS city,
        "offices_T532".country AS country,
        "offices_T532"."officeCode" AS "officeCode"
    FROM offices AS "offices_T532"
) AS "T534"
ON "employees_T529"."officeCode" = "T534"."officeCode"

Here is a considered earlier example of a query using self-join with a more complicated join condition, which finds all countries having at least two different offices:

Click for copyable input

Which will translate into SQL similar to this (note that currently DeleteDuplicates in the top-level EntityValue is executed on the Wolfram Language side, so there is no DISTINCT keyword in the SQL query):

SELECT "offices_T571".country AS country 
FROM offices AS "offices_T571"
        "offices_T574".country AS country_1,
        "offices_T574"."officeCode" AS "officeCode"
    FROM offices AS "offices_T574"
) AS "T576"
ON "offices_T571".country = "T576".country_1 AND "offices_T571"."officeCode" != "T576"."officeCode"


In this tutorial, subqueries normally means parts of the larger query, which can be expressed using EntityValue. On the SQL side, most of the time this corresponds to the inner SELECT statement with a single field, typically returning a scalar (either because there is a single row or because aggregation is being performed), but sometimes also returning a column (to be used in the IN clause, which would correspond to the use of MemberQ on the Entity framework side).

The following example (already considered before in the section on subqueries, and returning expensive products) represents a simple, uncorrelated subquery:

Click for copyable input

Which gets translated into something like this:

    "products_T583"."productName" AS "productName",
    "products_T583"."MSRP" AS "MSRP"
FROM products AS "products_T583"
WHERE "products_T583"."MSRP" >= 0.9 * (
    SELECT max("products_T586"."MSRP")
    FROM products AS "products_T586"

Where the query inside the product is a scalar subquery.

The following more complex version of the previous query extends each product with the number of products that are within a window of $15 from the current product, in terms of MSRP:

Click for copyable input

Where the subquery inside EntityFunction that defines the "closelyPricedProductsCount" extended property is a correlated subquery, since the filtering of products now depends on the price of the current product, which has to be referenced from that filtering/aggregation subquery.

This results in a query containing a correlated subquery, where the correlated subquery is in the SELECT list of the inner query and is aliased by "closelyPricedProductsCount", becoming a new extended property:

            SELECT "T640".count
            FROM (
                SELECT count("products_T638"."productCode") AS count
                FROM products AS "products_T638"
                WHERE abs("products_T638"."MSRP" - "products_T635"."MSRP") <= 15
            ) AS "T640"
        ) AS "closelyPricedProductsCount",
        "products_T635"."MSRP" AS "MSRP",
        "products_T635"."productName" AS "productName"
    FROM products AS "products_T635"
) AS "T637"
ORDER BY "T637"."closelyPricedProductsCount" DESC, "T637"."MSRP" DESC

There is currently no automatic optimization of the generated SQL code involving subqueries performed by the Entity framework query compiler (such as attempts to convert them into JOINs etc.). One should be aware of the performance implications of the use of correlated subqueries in the Entity framework, which are similar to those for SQL.


Relations provide a high-level way to perform lookups for properties in entity classes/types related to a given one (related database tables), without performing explicit joins. Their internal implementation can utilize different tools to achieve this goal, such as subqueries and/or joins, but these are hidden from the user.

For an example of the type of SQL that can be generated from queries that use relations, consider the following query that computes, for each office, the maximal number of customers that a single employee in that office deals with (it is also considered later in the last section of this tutorial):

Click for copyable input

For the current version of the query compiler, the SQL generated for this query may look like this (which is admittedly not very easy to read):

    "offices_T652"."officeCode" AS "officeCode",
        SELECT "T662".synthetic_prop_20
        FROM (
            SELECT max("T657"."customerCount") AS synthetic_prop_20
            FROM (
                    SELECT "T660".synthetic_prop_21
                    FROM (
                        SELECT count("customers_T658"."customerNumber") AS synthetic_prop_21
                        FROM customers AS "customers_T658"
                        WHERE "employees_T655"."employeeNumber" = "customers_T658"."salesRepEmployeeNumber"
                    ) AS "T660"
                ) AS "customerCount"
                FROM employees AS "employees_T655"
                WHERE "offices_T652"."officeCode" = "employees_T655"."officeCode"
            ) AS "T657"
        ) AS "T662"
    ) AS "maxEmployeeCustomerCount"
FROM offices AS "offices_T652"

The actual generated SQL for queries using relations might change in the future, as a result of optimizations or change of internal implementation.

Programmatic Query Construction and Generation


The symbolic nature of Entity framework queries allows one to generate such queries programmatically. There are a number of use cases for this capability.

One such use case is to reuse certain parts of the query in more than one place. That can be in either several different queries, or in one and the same query.

One important point to remember is that EntityFunction is HoldAll, and therefore, if one needs to use some part of the query stored in a variable, in the body of EntityFunction, one has to use With (or a similar construct) to inject that query part into the body of EntityFunction.

Reusing Query Parts in Different Queries

Reusing inert query blocks in several larger queries can be a common need in practice. Since queries are inert, one can easily do so. In particular, one can store smaller parts of a query in variables and use those variables in larger queries.

The following examples illustrate such uses.

Consider the query that aggregates over the payments and produces total payments for all customers:
Click for copyable input
These can be used on their own right, for example to find the five top-paying customers:
Click for copyable input
However, it can also be used to, for example, get total payments of all customers served by a specific employee:
Click for copyable input

Reusing Query Parts in the Same Query

In some cases, one may need to reuse the same query more than once in one and the same larger query. In SQL, the WITH keyword is used to achieve that. While the Entity framework might acquire native support for this construct in the future, one can emulate it rather easily.

The following examples illustrate such use cases.

Consider the following query, which computes the total number of times each particular product has been ordered, by aggregating over all orders:
Click for copyable input
It can be used on its own:
Click for copyable input
But it can also be used to, for example, find all items that were ordered the most, in which case it will appear twice in the query. Note that in this case, With is used, since totalOrderDetails should be injected into the body of EntityFunction:
Click for copyable input

Programmatic Query Generation

One can use the standard Wolfram Language techniques to generate query expressions programmatically.

The following example illustrates this by building an aggregate query that computes various aggregated quantities.

First, generate the AggregatedEntityClass:
Click for copyable input
Then execute the query, generating the list of string property names:
Click for copyable input

This example might be somewhat artificial, but it illustrates the main idea, which is that one can use standard Wolfram Language functions and idioms to automate the query construction process.

Symbolic Query Transformations

Programmatic access to queries and their symbolic nature can also be useful for certain more advanced scenarios, such as nontrivial query transformations or generation.

As an example, consider operator forms for common query building blocks. At present, for various reasons there is no direct support for those, for the core Entity framework query building blocks. Yet users often prefer this style of query writing, and it can in some cases make queries more readable by reducing the amount of nesting in the query.

Below is a simple implementation of an alternative syntax that would support operator forms.

One can start by defining a new set of symbols, which would serve as proxies for the Entity framework query primitives, but would support operator forms:
Click for copyable input
The next step is to define the entityClassQ predicate:
Click for copyable input
Next, the "compile" function is defined:
Click for copyable input
Finally, define the operator forms:
Click for copyable input

You may try this now on some example query.

The following example query is written in operator style and does the following: selects the five top-paying customers and returns their customer number, customer name and total amount they have paid, sorted in the order of decreasing total amount:
Click for copyable input

Of course, the symbolic nature of the queries and the excellent Wolfram Language capabilities for symbolic expression manipulations open many more possibilities; this is just one such example.

Practical Query-Building Techniques


This section contains brief example-driven coverage of a few practical techniques of query construction, which may be useful to have in one's toolbox.

One of the impediments to building complex queries is that the process may look and feel somewhat similar to writing code in compiled languages, where it is hard to test intermediate results or code pieces, and one has to come up with some complete function or program to compile and test it.

This section is intended to show that this is not really the case for Entity framework queries, and that with proper techniques, one can make the query-building process as interactive as most other activities in the Wolfram Language.

Building Queries Incrementally

The query to illustrate the technique described in this section should do the following: select the five top-paying customers and return their customer number, customer name and total amount they have paid, in the order of decreasing total amount.

Several steps are shown, which start from a very basic query and gradually build the query of interest, being guided by the intermediate results.

The starting point can be to just look at the contents of the "payments" table/entity type. But for prototyping purposes, it may make sense to take a smaller sample of entities of the "payments" type:
Click for copyable input
Click for copyable input
One can see that there typically are several payments for a given customer. To compute the total amount each customer has paid, one therefore needs to aggregate over these values, grouping by the customer number:
Click for copyable input
Since the customer name is also needed, one way to get it is to use CombinedEntityClass, as follows:
Click for copyable input

Note that in the last query, in the property list it is important to use the full EntityProperty["customers","customerNumber"], rather than just "customerNumber", since CombinedEntityClass[...] now contains two "customerNumber" properties, and it is necessary to disambiguate which one is really requested.

The next step is to sort the result with respect to the total amount paid, by using SortedEntityClass with a proper specification:
Click for copyable input
Next, select only the five top customers, by using SampledEntityClass:
Click for copyable input
The final step is to replace the sample used so far (samplePayments) with the full "payments" type:
Click for copyable input

This example illustrates how one can gradually build the query, one step at a time, taking each time the query built on the previous step as a starting point and checking the intermediate results.

Prototyping Complex Nested Queries Using Single Entities

It is often useful to prototype queries using single entities of a given entity type, before actually constructing them for the entire entity type. This section illustrates this process using a concrete example.

The query of interest is to select all offices where at least two employees have customers.

As the starting point, pick one of the offices:
Click for copyable input
Before building the actual query, get what you can using the top level. The starting point is to see which employees work in this office:
Click for copyable input
The next quick check is to see which customers are served by these employees:
Click for copyable input

Note that this is a very inefficient approach, which can only make sense in the prototyping stage, since it leads to a large number of queries being executed.

The next step is to construct a query that computes the number of customers for each employee. As a first step, one can hard-code the specific employee number. For example, from the preceding it follows that employee with the number 1165 should have six customers.

This is confirmed by the following query:
Click for copyable input
The next step is to remove the hard-coded employee number, which can be done as follows:
Click for copyable input

Where in the last query, an extra layer of EntityFunction is introduced, which allowed the specific employee entity to be passed into the query rather than its number being hard-coded inside of it.

Now one can test the resulting EntityFunction on all employees of a given office, which produces the result one would expect from the above top-level analysis based on EntityList:
Click for copyable input

The next step can be made by following the same logic, but now for the office entity.

Here is the query that computes the number of employees with customers, for the office in question:
Click for copyable input

Where the previously constructed query has been used as an inner building block, and also the relation o["employees"] has been used, just as previously inside the EntityList.

Everything is now ready for the construction of the original query of interest. It may look as follows:
Click for copyable input

Note how using single entities, one can incrementally build nested complex queries, making sure that things work at each level. This particular query technically contains a doubly nested subquery, one level of which is correlated.

This logic also works in reverse: given a query that does not function properly and contains complicated inner structure (involving nested EntityFunction expressions etc.), one can break it into pieces and test inner parts of the query on single entities, to quickly locate and fix the problematic place.

Errors and Error Handling

Various errors that might occur during the query execution fall into two broad categories: soft and hard errors.

Soft Errors

Soft errors are those that result in normal query evaluation, however not returning the results that would have been produced with a fully correct query.

An example of a soft error is when one tries to extract nonexistent property values from a type:
Click for copyable input

In which case, Missing["UnknownProperty",] values are returned for nonexistent property.

Another example of a soft error is when one tries to call EntityList or EntityProperties on a nonexistent type:
Click for copyable input

In which case, Missing[] values are also returned.

Hard Errors

For the purposes of this tutorial, hard errors are those that return Failure objects. The user-level error-handling may therefore amount to checking EntityValue return values for Failure objects.

Following are listed some of the most commonly encountered hard errors.

Invalid property in EntityFunction

One example of such an error is using an invalid property in EntityFunction.

The following attempts to compute an expression involving the nonexistent property "foo":
Click for copyable input

Uncompilable expressions in EntityFunction

Query compilation fails whenever the body of EntityFunction contains uncompilable parts.

The following query fails because of the presence of a global variable y in the query, which has no value and cannot be compiled to SQL:
Click for copyable input
The same happens here, but this time because the database-side BesselJ computation is not currently supported:
Click for copyable input

Incompatible types for expressions in EntityFunction

Another common source of hard errors is when one uses expressions of the wrong type in EntityFunction.

In the following query, the Greater operator cannot take string arguments:
Click for copyable input
While in the following case, an attempt to add together an integer and a string results in a type error:
Click for copyable input

Usually, the error message is rather informative in such cases in terms of identification of the cause of the error.

Values on unsupported types present in the query

Some types of Wolfram Language expressions are not currently supported by the Entity framework in the context of relational databases.

The following query fails because the value of 10^100 is too large to be used in a database query:
Click for copyable input
The following does not fail, however, since here the number has been explicitly converted to the machine-precision double:
Click for copyable input
Complex numbers are not supported by database backends directly, so the following results in an error as well:
Click for copyable input

Incompatible return type in EntityFunction

Some operations that take EntityFunction as an argument require specific return types. For example, when used in FilteredEntityClass or CombinedEntityClass, EntityFunction should have a Boolean return type.

In the following example, the return type of the filtering predicate EntityFunction is integer, while Boolean type is required:
Click for copyable input

Attempt to return a nonscalar from EntityFunction

At present, such functionality is not supported.

Here is an example where the result of EntityFunction semantically is a list of values, the type that cannot be returned from EntityFunction:
Click for copyable input

Improper use of relational lookups in aggregation

It is technically possible to construct an invalid aggregation query using relations, by combining columns from different tables in the same aggregation operation. Such queries fail to compile.

The following query fails since the subtraction is attempted essentially with different columns of different tables, which is not the valid query:
Click for copyable input

Not using aggregation functions in EntityFunction for AggregatedEntityClass

Expressions used to compute aggregated properties in EntityFunction for the case of aggregation must use one of the aggregation functions. It is technically possible to construct queries that do not, in which case they will not compile.

The following query fails, because the body of EntityFunction here has essentially a type of "column" (list of values) rather than a scalarwhich looks similar to another error already discussed, but here it is in the context of aggregation:
Click for copyable input
The following does not fail, since it contains an aggregation function (Total in this case), thereby making the body of EntityFunction be a scalar:
Click for copyable input

Operational errors

Operational errors are errors that are happening on the database side. While the Entity framework makes an effort to intercept most syntactic, type-related and other errors in the queries early, in some cases this either has not yet been done or might not be easily possible to do.

The following example is an attempt to extend a type "offices" with a new property that is in fact an entity class. Such operations are not currently supported by database-backed entities, and the error in this case happens to occur on the database side:
Click for copyable input

Bringing Everything Together: Examples of More Complex Queries

Most interesting real-world questions require queries that combine several query-building primitives in nontrivial ways. This section illustrates how one can achieve this, with a number of more interesting examples.

Example: Customers Sorted by the Total Amount They Paid

The following query extends the "customers" type with a property "totalPaid" that gives the total amount of money paid by that customer and then sorts by "totalPaid" value in descending order.

One solution would be to use a correlated subquery.

Here is the version that uses a correlated subquery:
Click for copyable input

The same can be achieved, using instead CombinedEntityClass and aggregation with grouping (notice that here one has to add those properties of the "customers" type that one wants to be available after aggregation to the list of properties one uses for grouping. Even though there is a single value of "customerName" that corresponds to specific value of "customerNumber", the query does not know that, unless we tell it this explicitly).

The following version uses CombinedEntityClass:
Click for copyable input

One can do the same thing even more economically by using relations.

This version uses relations:
Click for copyable input

Example: Total Number of Customers Dealt with by Each Office

One way of dealing with this type of problem is to use CombinedEntityClass to combine the types together and then aggregate over the combined type, grouping over some of the properties.

The following query computes the total number of customers served by all employees in the office, for each office. It does this by combining three types: "offices", "employees" and "customers", and then performing aggregation over office code values:
Click for copyable input

Note that in the last query, it was important to use the longer form EntityProperty["offices","officeCode"] instead of just "officeCode" to disambiguate the property, since CombinedEntityClass["offices","employees","officeCode"] contains two properties with the short name "officeCode": EntityProperty["offices","officeCode"] and EntityProperty["employees","officeCode"] (in this case, either one could be used).

The following more complex version of the preceding query computes the total number of customers served by each office and residing in the country where that office is located. In this case one has to use the full EntityProperty for the "country" property to disambiguate properties, since both types "offices" and "customers" have the property "country":
Click for copyable input

Example: Maximal Number of Customers per Employee, for Each Office

For cases such as this, relations often offer very economical solutions.

The following query computes, for each office, the maximal number of customers that a single employee in that office deals with. It relies on relations rather heavily:
Click for copyable input

Just to appreciate how much work relations are doing for the user here, below is the SQL generated for this query by the current version of the query compiler:

    "offices_T652"."officeCode" AS "officeCode",
        SELECT "T662".synthetic_prop_20
        FROM (
            SELECT max("T657"."customerCount") AS synthetic_prop_20
            FROM (
                    SELECT "T660".synthetic_prop_21
                    FROM (
                        SELECT count("customers_T658"."customerNumber") AS synthetic_prop_21
                        FROM customers AS "customers_T658"
                        WHERE "employees_T655"."employeeNumber" = "customers_T658"."salesRepEmployeeNumber"
                    ) AS "T660"
                ) AS "customerCount"
                FROM employees AS "employees_T655"
                WHERE "offices_T652"."officeCode" = "employees_T655"."officeCode"
            ) AS "T657"
        ) AS "T662"
    ) AS "maxEmployeeCustomerCount"
FROM offices AS "offices_T652"

Example: Customers Who Owe Money

This is another showcase to illustrate the power of relations. The task is to select customers who still owe moneyin other words, those for whom the amount of total orders exceeds the amount of total payments to date.

To compute total amount to pay, one can start with all the orders for a given customer, which is given by the relation c["orders"] and is an entity class of type "orders". Since each order may contain multiple items, and each item may be ordered in multiple quantities, the next step is to extend this class with a property "totalToPay", which computes the amount to be paid for each order by reaching to all "orderdetails" entities related to this order and computing the total of the product of item price and ordered quantity. Note that here, relation o["orderdetails"] is used for each order. Then one can do the second aggregation, this time over all the orders for a given customer, to obtain the total amount they should pay.

The computation of the total amount the customer already paid is considerably simpler, since it requires a single relational lookup c["payments"]. The small cutoff 0.00001 is introduced to avoid roundoff errors.

The query is as follows:
Click for copyable input

It takes considerably more work to obtain the same result without using relations.

The following query is one possible way to do this and uses correlated subqueries and CombinedEntityClass.
Click for copyable input

The With in the last example is used for readability and is not required; one could have used a single large query. Note however that With is used with := initialization for scoped variables, so that they get injected into the body of the main EntityFunction without evaluation.

Example: Total Payments by the Five Top-Paying Customers, for Each Employee

The goal for this example is to compute for each employee the total amount to be paid by the five top-paying customers served by that employee, and sort employees in the order of descending total amount (which can be used as a measure of success for a given employee).

Using relations here can considerably simplify the query.

The following query is using relations. It illustrates how one can follow relations and compute aggregates conveniently (like in Total[c["payments"]["amount"]] below). Note that in this case, the use of relations allows one to use data from three different database tables ("employees", "customers" and "payments"), in a concise and economical fashion:
Click for copyable input

The following version does not use relations.

Again, obtaining the same result without using relations takes a lot more effort:
Click for copyable input

In the above example, as also in the example before it, With is used mostly for readability.