Unified async query execution

Topics: EF Runtime
Jul 22, 2012 at 6:47 PM
Edited Jul 22, 2012 at 7:38 PM

I have been digging in the implementation of the new async features of Entity Framework. One of the thing that came to my mind is that these features have a completely separate solution with "internal" interfaces. I think this could cause the fragmentation of the .NET ecosystem (e.g. there has been an IAsyncEnumerable interface in Rx already). Furthermore, I think one of the main advantages of Linq was that it was completely transparent, the underlying data engines could be completely abstracted (e.g. look how the ASP.NET Web API utilizes this). It would be really nice if the same were true for the new async features.
Obviously, this would require to introduce the appropriate interfaces in the mscorlib, then the async features of EF could be built on them, unfortunatelly this could be problematic in the current development stage of .NET.

The following interfaces and code sample presents my idea:

IAsyncQueryable (for async operations), IAsyncQueryProvider (IDbAsyncQueryProvider), IAsyncEnumerable (IDbAsyncEnumerable), IAsyncEnumerator (IDbAsyncEnumerator)


IQueryable<Person> q = efContext.People;
IAsyncQueryable<Person> aq = q.AsAsync(); // The main idea behind this is to separate the async methods completelly, however this "conversion" can be left out from the solution easily, IAsyncQueryable is not a must have

var person = await aq.First(); // or FirstAsync?


Is there any plan to achieve unification?

Coordinator
Jul 23, 2012 at 12:45 AM
Edited Jul 23, 2012 at 12:46 AM

@tamasflamich: Thanks for the great insights! This is an area in which we have been forced to make some compromises, so we are actively seeking feedback to validate that the compromises we picked are all right. As usual, we expect to have to iterate over the design several times until we get most things right :)

Let me try to answer your questions at the same time I give you some context on how we arrived to the current design...

Common interfaces

First of all, we are definitively trying to avoid causing fragmentation in the ecosystem. We are hopeful that the EF interfaces are hidden enough from the experience (e.g. if you are writing application code you should never really need to cast to any of the interfaces, but just call ForEachAsync, ToListAsync, FirstAsync, etc.) that if a clear standard for async collections emerges then we will be able to make the switch to that. By making the interfaces EF specific we are also opting out from having the EF become such standard for now (as there are other teams that are in a better position to produce that).

I completely agree that one of the advantages of LINQ is how it abstracts the underlying query providers. We in fact talked a lot to the Languages, TPL and Rx teams to try to define together a set of common async collection abstractions we could share. As you mention it would have been nice for us if such abstractions existed in the BCL, but the reality is that they don't exist and it is not certain that they will be added. We looked too at the IAsyncEnumerable interfaces as defined by Rx but in the end we came to the conclusion that it was more appropriate to create a solution that was local to EF. That way we can:

  1. Focus on getting the experience right for application developers
  2. Keep the number of concepts developers need to learn to a minimum
  3. Retain the ability to interop with the Rx interfaces through some kind of adapter
  4. Enable framework-like code to do more advanced things by casting to our public interfaces 

Async queries

At some point we thought about having something like AsAsync(). Besides the strong advice of the Rx team against it, it felt at the time like a somewhat unnecessary abstraction:

In LINQ there is currently no strict separation between query construction and query execution, i.e. you usually perform most query construction using IQueryable<T> operators but if you happen to use one of the “immediate” query operators then the query executes immediately. We asked ourselves the question: Could we somehow draw a more strict line between query construction and execution so that we could reuse all the existing LINQ operators for query construction but defer to the execution phase the decision on whether we want to process the query synchronously or asynchronously?

Based on this idea, our primary design for execution of immediate queries is a method that you won’t still find in the code base:

Task<T> QueryAsync<T>(Expression<Func<T>>)

The method can be used like this:

var customer = await db.QueryAsync(() => db.Customers.First(c => c.Id == id));

You may notice that this expression is using the regular First operator. It works because the call to First is in this case being captured as part of an expression and not really being executed. Expression<Func<T>> is indeed an apt representation of a deferred query that returns a single element, which is what we were looking for!

The plan is to add this method at a later point. Once we do so, it will allow any LINQ to Entities-recognized expression to be processed asynchronously on the server, e.g. it will be even possible to execute a query that isn’t started with a IQueryable<T>:

var distance = await db.QueryAsync(() => spatialPoint1.Distance(spatialPoint2));

Since this is a generally useful thing to do, we are planning to add a synchronous version of the same method as well.

Async immediate LINQ operators

Initially we were going to only add ForEachAsync, ToListAsync and QueryAsync for asynchronous queries, but later we ended up deciding to add the async version of other immediate LINQ operators as a compromise justified mostly by method discoverability, so you can do now something like this:

var customer = await db.Customers.FirstAsync(c => c.Id == id));

In my opinion these methods are a bit weird, but they are very convenient. In any case, we are seeking feedback on these (both their existence and their implementation). A few details:

  1. These methods are purposely defined on a sponsor class on our own namespaces so that they don’t accidentally “pollute” other IQueryable<T> queries unless you have imported System.Data.Entity.
  2. In their current implementation rather than creating an expression with a call to themselves as most immediate LINQ operators do, they will compose an expression with a method call to the non-async version and then invoke the IDbAsyncQueryProvider.ExecuteAsync method.
  3. They throw if the IQueryaProvider cannot be caseted to IDbAsyncQueryProvider.

My main concern with these methods is that they may get in the way of testability, e.g. for developers used to take advantage of the abstraction of IQueryable<T>, implementing an additional interface in fakes and mocks can turn out to be an additional burden. But this is something I haven’t tried myself so it might turn out not to be a big deal. The implementation of the methods also feels a little weird as well mainly due to #2.

We looked at several other options as well. I will need to spend some more time explaining them if you are interested :)

Thanks,
Diego

Jul 23, 2012 at 8:08 AM

Thank you for providing such detailed explanation of the design decisions. Good to see that the team is well aware of the problem and trying to give us the best possible solution. Now I see that the lack of the standard BCL interfaces will be a problem for some time and unfortunately their introduction is quite uncertain at this point, I hope that this will end in a nice way.

I am indeed very much interested in the "other options", but I would prefer to read them (along with the things already mentioned here) in a great post on the ADO.NET blog ;)

Coordinator
Jul 26, 2012 at 11:03 PM

Thanks for the feedback! By the way, you may be interested in the design overview for async that we posted today: http://entityframework.codeplex.com/wikipage?title=Task-based%20Asynchronous%20Pattern%20support%20in%20EF.