Implicit boundaries in LINQ to Entities (client-side evaluation)

This document contains thinking done previously by people on the EF team on how to support more client-side evaluation in LINQ to Entities queries. It is posted here for guidance and as a reference for anyone in the community who might take on such a feature.

Overview

LINQ forces us to blur the boundary between the server and the client. For a provider like LINQ to Entities, this means that a query supplied to the stack is always partly evaluated in the client application and partly in the database server. For instance, the query

int productId = 1;
var q = 
    from p in context.DbSet<Product>()
    where p.ProductID == productId
    where DetermineProductPriority(p) == "High"
    select new XElement(
        "Product", 
        new XAttribute("Name", p.Name), 
        new XAttribute("ID", p.ProductId));

includes a selection evaluated by the server (‘p.ProductId = @productId’), but also client expressions, e.g. the binding of the free variable ‘productId’ and materialization of XML nodes in the projection. It also includes a call to a client-side method on a predicate over a server correlated expression, something that is not supported by either LINQ to Entities or in most LINQ to * implementations.

While we have found it convenient to talk about LINQ to Entities as a strict implementation – all server or nothing – and other implementations such as LINQ to SQL as hybrid implementations – splitting the query into client and server expressions – these implementations really exist along a continuum, and it makes sense to examine this continuum in more detail to clarify the current behavior and how we could improve EF.

It is convenient to discuss this continuum with respect to the following expression scopes:

  • Independent sub-expressions: in the above query, the access to the free variable ‘productId’ is compiled into something like Expression.Field(Expression.Constant(CS$<>8__locals6), fieldof(<>c__DisplayClass5.productId)), which does not depend on the current scope of the query (i.e. it is not correlated to the contents of any table in the database). The process of identifying and evaluating these independent sub-expressions has become known as expression funcletization at Microsoft.
  • Client sources: LINQ queries are typically bootstrapped by IQueryable roots, e.g. ‘dbContext.DbSet<Product>()’ in the above example.
  • Client projections: while client sources introduce typed iterators as the root or roots for remote queries, client projections close the loop by shaping typed query results. In the above example, entity results are shaped into XML nodes, ‘new XElement…’
  • Dependent sub-expressions: certain sub-expressions can only be evaluated by the client, e.g. ‘DetermineProductPriority…’, but depend on intermediate results from the server, e.g. ‘DetermineProductPriority(p)’.

Note that these categories are somewhat arbitrary. A client source is a kind of independent sub-expression and a dependent sub-expression is in some ways just a generalization of client projections. The categories are still intuitively useful however: independent sub-expressions often map to parameters while client sources mostly map to scans, and; client projections are frequently benign while dependent sub-expressions are often cause for concern (consider high selectivity filters).

Implicit boundaries are dangerous but also an essential feature of LINQ. Finding the appropriate balance is important. From feedback we have received over the years, we know that users expect magic and may be disappointed if we either throw because we’re overly strict or we end up streaming 1,000,000 rows from the database into the client to find the 10 rows matching a client predicate because we were too loose.

Independent sub-expressions and client sources

Options:

  • Necessary: free variables and constants are unavoidable. We need to allow expressions that represent access to field, property and constant in order to implement a viable LINQ provider.
  • Literals: the LINQ equivalent of value literals. Allow constants (‘1’), primitive type constructors (‘new DateTime(2008, 5, 28)’) and even array initialization patterns (‘ new int[] {…}’).
  • Root construction: currently LINQ to Entities only supports some forms of roots inside the query, e.g. context.Products is recognized, but unfortunately inline construction of query roots is not, e.g. ‘dbContext.DbSet<T>()’ and ‘context.CreateQuery<T>(string)’ are not recognized.
  • Server-unsupported expressions that can be converted to server parameters: currently LINQ to Entities will throw if it finds any expression that it cannot evaluate on the server in a query. If we could turn an independent expression into a query parameter and we know that no part of the expression could ever be evaluated by the server, at least with the same semantics, we could funcletize it. There is an interesting challenge on deciding how much of a sub-expression we can funcletize. For instance, consider the expression ‘stringBuilder.ToString().Length’. While the ‘stringBuilder.ToString()’ part can only be evaluated on the client, ‘.Length’ could either be evaluated on the client alongside the rest of the expression or translated to LEN(@param) in the store. LEN() in SQL Server has subtly different semantics form string.Lenght in the CLR in that it ignores trailing blanks. We have three options on what we can funcletize:
    a. The minimal sub-expression that cannot be evaluated on the server
    b. The minimal sub-expression that cannot be evaluated on the client plus any expression that can be evaluated on either the client or the server with identical semantics
    c. The maximal sub-expression that can be evaluated on the client
  • Options (a) and (b) guarantee that at least for a particular server, all occurrences of such expressions would have consistent semantics, regardless of where they appear in the query. The alternative to this is to evaluate. Option (c) would imply that any sub-expression that can be turned into a parameter would be evaluated on the client. This is the most flexible approach but means we apply inconsistent semantics to some operators depending on which side of the boundary they find themselves on.
  • Server-unsupported expressions that cannot be turned into server parameters: We should also explore these. There are interesting solutions where you pipe values through the query to the result, which works when the value is never cracked or cracked at need on the client. This greatly increases the cost of the feature for LINQ to Entities which would need to introduce its own intermediate metadata representation for such values.

Client projection

Options:

  • Composable: projections that can be composed within a query are supported as client projections. For instance, entities, complex types and “rows” can be projected but arbitrary method calls or constructors cannot.
  • Non-composable: top-level projections could include arbitrary method calls and constructors. For efficiency, reverse funcletization occurs for the client projection, e.g. ‘select ClientMethod1(ClientMethod2(e.X, e.Y), e.Z)’ becomes ‘select new { e.X, e.Y, e.Z } into f select ClientMethod1(ClientMethod2(f.X, f.Y), f.Z)’.

Note that method calls may introduce additional round-trips to the server. Before people start shouting about “nanny state APIs”, consider that users are not complaining that they wanted the round-trips but that we failed to crack the methods to figure out how to avoid them…

Dependent sub-expressions

What happens when a sub-expression cannot be evaluated by the server but depends on intermediate results?

  • Whenever an unsupported expression is encountered, we could simply split the query at that point (modulo the kinds of local optimizations described for client optimization).
  • We should attempt to push as much server logic “down” the tree as possible to minimize the amount of work in the client. This is critical where joins, selections and even some projections are involved.

Interface considerations

If we implement support for these patterns, we should also consider allowing the user to disable them. The user can exercise whatever level of control they want over the client-server partitioning of the query. In addition, we should make the partitioned plan visible to the user, either by using documented boundary expressions or through a debugger visualizer.

Implementation considerations

We can include a separate pass to identify supported and unsupported expressions in the query tree, similar to other LINQ implementations.

Last edited Mar 6, 2013 at 4:44 PM by ajcvickers, version 2