EF6 will support the new simplified approach to asynchronous programming introduced in Visual Studio 2012/.NET 4.5. You can read more about it here.

The following topics are covered in this document:

 

Goals

Calling long-running methods asynchronously can have a positive effect on these aspects of a data driven solution:

  • Server Scalability
  • Client responsiveness

Server Scalability

In typical server-side scenarios, multiple threads are allocated to process multiple overlapping client requests.

Whenever one of these threads performs a blocking operation (i.e. an operation that accesses the network or some other form of I/O) the thread will remain idle waiting for the operation to complete. If during the interval new requests arrive, each of them is assigned its own thread.

Depending mainly on the rate at which new client requests arrive, and the time it takes for each request to be served, the number of threads executing (or blocked) on the server might grow rapidly.

Threads happen to have a significant memory footprint (about 1 MB of virtual memory space); therefore too many simultaneous threads on a server can easily max out the available memory long before processor utilization comes near a 100%, becoming the main bottleneck for the server’s throughput.

In cases like these it is possible to improve scalability without changing hardware by using non-blocking calls when communicating to external resources: threads don’t need to waste time waiting for those calls to complete, and instead can be returned to the thread pool so that they can be reused to service other incoming requests.

Non-blocking I/O calls help in keeping the thread count low, removing the memory bottleneck and making it possible for the application to scale better on the same hardware.

For an illustrative description of this impact see F# async on the server side.

Client Responsiveness

On the other hand, even with the progress and broad availability of high-speed network connectivity, latency is still one of the dominant factors in the usability of distributed applications. Data intensive applications in the cloud can rapidly degrade if their user interfaces spend most of the time blocked waiting for responses from the server.

The new Task-based Asynchronous Pattern (TAP) provides a simple way for users to make asynchronous the long-running methods that currently make the UI unresponsive.

Non-Goals

The following are things we are explicitly not trying to enable with the feature in EF6:

  • Thread safety
  • Async lazy loading

Thread Safety

While thread safety would make async more useful it is an orthogonal feature. It is unclear that we could ever implement support for it in the most general case, given that EF interacts with a graph composed of user code to maintain state and there aren't easy ways to ensure that this code is also thread safe.

For the moment, EF will detect if the developer attempts to execute two async operations at one time and throw.

Async Lazy Loading

Lazy loading was one of the most requested features we added in .NET 4. It is a quite powerful feature because it allows “virtualizing” the navigation over a graph of objects that is actually stored in the database, providing the illusion that is completely loaded into memory. This allows for better separation of concerns and simpler code, but it has a number of disadvantages.

One of the main critiques to lazy loading is the fact that the cost of reading a property becomes indeterministic. It seems that there is no place for this kind of indeterminism in the scenarios in which we expect Task-based async to be critical.

In other words, there is an argument that leads to the conclusion that someone that is optimizing for server throughput should not use lazy loading and instead should use eager or explicit loading.

However, there is a hybrid approach we could consider supporting in the future:

public class Order
{
    public virtual Task<Customer> CustomerAsync { get; set; }
    …
}
var order = await context.Orders.FindAsync(1); var customer = await order.CustomerAsync;

In this case, EF would need to recognize that the pattern of a property that returns a Task<T>, where T is an entity type is actually an “async navigation property”, create and do the adequate Object-Conceptual mapping for the actual property. Since the property is virtual, EF can generate a dynamic proxy that implements the lazy loading.

We should keep in mind that using the Task-based async patterns with properties dilutes some of the transparency that lazy loading provides. That might be ok, since thanks to TAP support in the language the code doesn’t look too different from the sync version, it is just the navigation becomes explicitly asynchronous.

Another important challenge would be how to refer to an async navigation property in a LINQ expression, given that the .NET languages currently don't support construction of lambda expressions containing await.

Dependencies

We are able to provide async support in EF by basing our implementation on the new async API in ADO.NET provider model and the async and await keywords introduced in Visual Studio 2012/.NET 4.5.

However if a specific provider doesn’t implement the asynchronous methods they will fall back to synchronous execution without any warning.

Async support in EF requires .NET 4.5 and will not be available on .NET 4.

Design

We are aiming to introduce async versions of the operations that perform network I/O and could become the bottleneck on either the client or the server. This includes all operations that cause results to be materialized from the database (i.e. for each, ToList, Single, SqlQuery, etc.) and operations that cause commands to be sent to the database (i.e. SaveChanges, ExecuteSqlCommand, etc.).

We are following the generally accepted standard of introducing a second asyncronous version of each method, using the Async post fix (i.e. SaveChanges and SaveChangesAsync).

Our main intent is to provide async versions of methods on the DbContext API. Where these methods require an asynchronous counterpart on the ObjectContext API we will also add that method (i.e. SaveChanges). In some cases we may also implement additional asyn methods on the ObjectContext API for the sake of completeness.

We are not attempting to provide asynchronous database/schema creation.

For methods that LINQ to Entities does not support (i.e. Last), we are not providing an Async method.

API Examples

Query

A typical example of code that causes a query to be sent to the database is iterating over the results of a LINQ query:

var query = from e in context.Employees
            where e.Name.StartsWith("a")
            select e;

foreach (var employee in query)
{
    Console.WriteLine(employee);
}

However there’s currently no async equivalent of a foreach statement, so we will add an extension method that offers the same functionality:

await query.ForEachAsync(employee =>
{
     Console.WriteLine(employee);
});

We will also add async counterparts of the IQueryable extension methods that materialize a collection:

var employeeArray = await query.ToArrayAsync();

And those that materialize a single entity:

var firstEmployee = await query.FirstAsync();

Async versions of aggregate methods will also be provided:

var employeeCount = await query.CountAsync();

Saving changes

// Modify
var product1 = await context.Products.FindAsync(1);
product1.Name= "Smarties";

// Delete
var product2 = awaitcontext.Products.FindAsync(2);
context.Products.Remove(product2);

// Add
var product3 = new Product() { Name = "Branston Pickle" };
context.Products.Add(product3);

// Save
int savedCount = await context.SaveChangesAsync();

Console.WriteLine("Affected entities: " + savedCount); // 3

Loading

await context.Categories.Include(c => c.Products).LoadAsync();

Raw SQL Queries

var categories = await context.Database.SqlQuery<Category>(
    "select * from Categories").ToListAsync();

Cancellation Tokens

All the async methods have overloads that accept a CancellationToken. Since it is expected that the bulk of the execution time of these methods will be waiting for the database operation the cancellation is delegated to the ADO.NET provider.

The exception to this is ForEachAsync as in this method we can potentially have many opportunities to cancel the operation between the calls to the database.

Limitations

While async has some real advantages it is not for everyone. Asynchronous invocations can introduce overhead and can degrade performance if not used correctly. As with any performance-related changes establish goals and perform measurements before making any modifications to your applications.

Implementation Challenges

Code Duplication

All of the new methods have equivalent behavior and implementation to the existing synchronous ones. However they return Task or Task<T> which makes it difficult to unify the implementations without decreasing the performance of the synchronous methods, since Task creation usually means allocation of a new object.

We still can extract parts of the implementation as long as they don’t contain async method calls. Also both implementations are placed consecutively in the source code, so it’s easy to change both when needed.

Performance Considerations

Some things that we have done to provide better performance are:

  • We are calling ConfigureAwait(continueOnCapturedContext: false) on the tasks before awaiting them. A large portion of the async overhead is marshaling the continuation to the original context. In a library not only is this not necessary, but could actually result in a deadlock when called from code where context is important, like a UI thread.
  • When there’s only a single place in the method where a Task is created and it’s a tail call (i.e. the last statement in the method) then it is replaced by a delegate to avoid unnecessary awaits (but this is rarely the case).

In the future we could consider the following options for further improving performance:

  • Further minimizing the number of async method calls on the stack. The compiler can’t inline methods in an async method so each invocation in an async method carries overhead. Also when an async method yields all the local variables are saved to the heap. This also means that async methods should have less local variables.
  • As a last resort of improving performance we can drop the async keyword and implement TAP manually with TaskCompletionSource<TResult>.

You can read more about performance challenges in async methods here.

Last edited Sep 17, 2013 at 8:40 PM by RoMiller, version 20

Comments

MaxiHori Sep 12 at 2:46 PM 
since the "hybrid approach" it is not implemented yet and maybe it will never be, what is the recommended way to use navigation property (improved productivity) and the await feature (scalability)?

The feature that I would really love is the one that you provided as a possible future example:
var order = await context.Orders.FindAsync(1);
var customer = await order.CustomerAsync;

and the closer I can get seems to be:
var order = await context.Orders.FindAsync(1);
var customer = await Task.Run(return order.Customer);

Can you provide a better way to keep both the functionalities at the same time?

joezen777 Nov 26, 2013 at 6:01 PM 
Thanks for the article. Answered a question for me on the importance of having virtual on your navigation properties. I had a couple of navigation properties that did not have the virtual and I was getting "sequence contains no elements" errors but it would work just fine when I stepped into while debugging. After adding the virtual, it worked. The virtual enables Entity Framework to implement overrides that probably look something like (navigationPropertyLazyLoading {get { return navigationPropertyLazyLoadingTask.Result;}} )

Halo_Four Mar 8, 2013 at 2:33 PM 
In addition to ForEachAsync I think it would be awesome if there was a method that could return an IObservable<T> to the results of a query. I know that IObservable<T> doesn't fit completely within the async/await model at the moment (which is such a huge lost opportunity, async foreach anybody?) but with the addition of Rx and LINQ queries over them they are a great tool for working with potentially asynchronous operations with multiple potential results.