Background

Applications connecting to a database server have always been vulnerable to connection breaks due to back-end failures and network instability. However, in a LAN based environment working against dedicated database servers these errors are rare enough that extra logic to handle those failures is not often required. With the rise of cloud based database servers such as Windows Azure SQL Database and connections over less reliable networks it is now more common for connection breaks to occur. This could be due to defensive techniques that cloud databases use to ensure fairness of service, such as connection throttling, or to instability in the network causing intermittent timeouts and other transient errors.

Goals

  • Pri0: Provide the ability to retry actions on a variety of failures automatically, minimizing the amount of defensive code and retry logic that is necessary to handle transient failures in common cloud data access scenarios
  • Pri1: A user should be able to implement their own retry strategies and decide which exceptions should cause a retry and which should not.

Non Goals

  • Fully supporting moving unchanged EF-based applications to scenarios with unreliable connections
    • Resiliency is off by default, although we will try to help people know about it by proving information in transient failure exceptions

Design Meeting Notes

http://entityframework.codeplex.com/wikipage?title=Design%20Meeting%20Notes%20-%20November%208%2c%202012
http://entityframework.codeplex.com/wikipage?title=Design%20Meeting%20Notes%20-%20December%206%2c%202012
http://entityframework.codeplex.com/wikipage?title=Design%20Meeting%20Notes%20-%20January%2024%2c%202013

Experience

When using the connection resiliency feature, EF will take care of retrying a command whenever transient errors occur. As an application writer using connection resiliency you still need to handle scenarios where the error is not transient, or is not resolved before the maximum retries have occurred. In the case of an exception that is not retried, for whatever reason, EF will bubble up the exception in the same way it would have if connection resiliency was not enabled.
Connection resiliency should be able to be enabled with a small change to configuration, with the possibility of writing code to implement your own retry strategies if you need to.
What database operations are automatically retried:
  • Executing an EF query:
    • On the DbContext API: DbSet, DbQuery,
    • On the ObjectContext API: ObjectQuery (both for EntitySQL and LINQ to Entities) and CompiledQuery
  • Executing raw Sql:
    • On the DbContext API: SqlQuery and ExecuteSqlCommand
    • On the ObjectContext API: ExecuteStoreQuery and ExecuteStoreCommand
  • Invoking a stored procedure through ExecuteFunction
  • Calling SaveChanges on a DbContext or ObjectContext
  • Executing Migrations
  • Using Load or Reload on DbContext
  • Using Refresh on an ObjectContext

What database operations are not automatically retried:
  • Executing queries in EntityClient, e.g. using EntityCommand

Implementation

Connection retry is taken care of by an implementation of the IDbExecutionStrategy interface. Implementations of the IDbExecutionStrategy will be responsible for accepting an operation and, if an exception occurs, determining if a retry is appropriate and retrying if it is. EF will use an IDbExecutionStrategy each time it executes an operation against the database. The IDbExecutionStrategy interface looks like the following:

public interface IDbExecutionStrategy
{
    bool RetriesOnFailure { get; }

    void Execute(Action operation);
    TResult Execute<TResult>(Func<TResult> operation);

    Task ExecuteAsync(Func<Task> operation, CancellationToken cancellationToken);
    Task<TResult> ExecuteAsync<TResult>(Func<Task<TResult>> operation, CancellationToken cancellationToken);
}

Default Implementations

EF will ship with four execution strategies:
  1. DefaultExecutionStrategy: this execution strategy does not retry any operations, it is the default for databases other than sql server.
  2. DefaultSqlExecutionStrategy: this execution strategy does not retry at all, however, it will wrap any exceptions that could be transient to inform users that they might want to enable connection resiliency.
  3. DbExecutionStrategy: this class is suitable as a base class for other execution strategies. It implements an exponential retry policy, where the initial retry happens with zero delay and the delay increases exponentially until the maximum retry count is hit. This class has an abstract ShouldRetryOn method that can be implemented in derived execution strategies to control which exceptions should be retried.
  4. SqlAzureExecutionStrategy: this execution strategy inherits from DbExecutionStrategy and will retry on exceptions that are known to be possibly transient when working with SqlAzure.
Note: Execution strategies 2 and 4 are included in the Sql Server provider that ships with EF, which is in the EntityFramework.SqlServer assembly.

Using an Execution Strategy

The main way to use execution strategies is to configure EF so that it will use the given strategy for all actions. In order to do this you need to add a resolver to the DependencyResolver chain that EF will use to find an IDbExecutionStrategy when it requires one. There is a sugar method added to DbConfiguration called ExecutionStrategy. To use the SqlAzureExecutionStrategy you would call the method in your DbConfiguration like this:

public class MyConfiguration : DbConfiguration
{
    public MyConfiguration()
    {
        SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy());
    }
} 

The provider invariant name is used as a key for the factory that will return a new execution strategy. You can also use SqlProviderServices.ProviderInvariantName. We accept a factory rather than an instance of the IDbExecutionStrategy because each retriable operation should use a new instance of DbExecutionStrategy.

EF providers have the opportunity to register an execution strategy with the dependency resolver, the SqlServer provider that ships with EF will register the DefaultSqlExecutioStrategy described in the section above. Other providers will return whichever strategy they think is appropriate for their database. If no execution strategy is found then EF will use the DefaultExecutionStrategy.

The other way that you could use an execution policy is to create an instance of the execution policy and use it to execute an action yourself. This might be useful when you do not need to retry for most things, but want to use one for a small number of queries. An example of using the execution strategy this way would be something like this:

var executionStrategy = new SqlAzureExecutionStrategy();
var blogs = executionStrategy.Execute<IEnumerable<Blog>>(GetBlogs);

NOTE: The DbExecutionStrategy base class that SqlAzureExecutionStrategy inherits from is not designed to be thread safe.

Configuring SqlAzureExecutionStrategy

The constructor of SqlAzureExecutionStrategy can accept two parameters, MaxRetryCount and MaxDelay. MaxRetry count is the maximum number of times that the strategy will retry before throwing a RetryLimitExceededException. The MaxDelay is a TimeSpan representing the maximum delay between retries that the execution strategy will use.

public class MyConfiguration : DbConfiguration
{
    public MyConfiguration()
    {
        SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy(1, TimeSpan.FromSeconds(30)));
    }

}

Another way to modify the SqlAzureExecutionStrategy is to add an exception that you want to retry on. The easiest way to do this is to create a new ExecutionStrategy that derives from SqlAzureExecutionStrategy and adds your special case, for example:

public class MySqlAzureExecutionStrategy1 : SqlAzureExecutionStrategy
{
    private List<int> _errorCodesToRetry = new List<int>
        {
            //List custom error codes here.
        };

    protected override bool ShouldRetryOn(Exception exception)
    {
        var sqlException = exception as SqlException;
        if (sqlException != null)
        {
            foreach (SqlError err in sqlException.Errors)
            {
                // Enumerate through all errors found in the exception.
                if (_errorCodesToRetry.Contains(err.Number))
                {
                    return true;
                }
            }
        }
        return base.ShouldRetryOn(exception);
    }
}

User Transaction

One requirement of using connection resiliency with EF is that EF is responsible for the creation and management of transactions. This is because EF will need to rollback and retry the current transaction in order to retry, which we cannot do with a transaction that was created and passed to EF. You can work around this by wrapping your entire logic, including the creation of your transaction inside an explicit usage of an execution strategy. More details on this, and examples of how to handle user created transactions, can be found here: http://msdn.microsoft.com/en-US/data/dn307226

Buffering of Query Results

Buffering is required when using connection resiliency in order to keep the state manager consistent when there is a retry. However, there are also other benefits to buffering the main ones being:
  1. The connection is potentially open and in-use for a shorter period of time
  2. Multiple Active Result Sets (MARS) would not be needed for nested queries, such as those that come with lazy loading.
Because of this buffering is now the default. We will provide an AsStreaming method on IQueryable to allow the old behavior, similar to the AsNoTracking method for non-tracking queries.

Implementation

When in streaming mode EF uses a data reader that is wrapped in an IEnumerable that reads from the data reader as rows are required for materialization.
With the buffering enabled the process is the same, except that the underlying provider DataReader will be read into memory, and close the connection, before it begins to yield rows from in-memory collection of data.

AsStreaming

An AsStreaming extension method will be added to IQueryable that will return a query that will use the streaming behavior described above.

Last edited Mar 6 at 11:07 PM by AndriySvyryd, version 18

Comments

AndriySvyryd Jul 18, 2013 at 9:48 PM 
@kostat_plex
WCF service with transactional operations recomendation doesn't apply to EF with SQL Azure at this time as it offers different guarantees from a traditional SQL Server. If the transaction should only wrap one operation at a time it can be replicated with EF. You would only need to disable the transaction creation in your code and setup SqlAzureExecutionStrategy to have each EF operation executed in a transaction.
The problem with outer transaction is when the connection fails at any time after the first operation has been executed as the transaction will be rolled back, but only the last operation can be retried.
You should also avoid using TransactionScope with async methods as it will not be flown through reliably.

kostat_plex Jul 7, 2013 at 8:52 AM 
@AndriySvyryd:
You said: [Outer transactions aren't supported because when an operation that is part of an existing transaction fails then the transaction is rolled back...]. If I understand correctly the connection reliability problem, it shouldn't affect the current transaction except the case when the DB correctly processed the request, but the response was lost. This case should be equally rare in cloud and LAN environments.
My question is: what problems you expect if I suppress existing transaction test and remove TimeoutException from ShouldRetryOn logic?

kostat_plex Jul 6, 2013 at 1:43 PM 
@AndriySvyryd [May 9 (transactions)]
The most common scenario is WCF service where the transaction support is specified by OperationBehavior attribute. I'm talking about the most widely used scenario when transaction simply "wraps" the operation. But since it's started and completed by the framework, the solution you suggest cannot work.
So my question is: "What's the MS suggested way to work with SqlAzureExecutionStrategy in case of a standard WCF service with transactional operations (widely recommended by MS)"?

alexdresko Jun 5, 2013 at 2:28 PM 
Just installed Beta 1 and I get this error on build..:

Error 1 The type or namespace name 'SqlAzureDbConfiguration' could not be found (are you missing a using directive or an assembly reference?)

AndriySvyryd May 8, 2013 at 11:32 PM 
@GuardianAngel Outer transactions aren't supported because when an operation that is part of an existing transaction fails then the transaction is rolled back, but the retry strategy isn't aware of what other operations were executed in that transaction previously so it can't retry them.

SupportsExistingTransactions flag is for strategies that don't ever retry the execution.

The execution strategy does support transactions started inside the retryable action and actually in most cases we start a new transaction if one didn't exist.

If in your scenario you need to execute several operations in a single transaction then instead of globally enabling SqlAzureExecutionStrategy you can create a new instance of it and call Execute passing in the block that starts the transaction and executes the operations.

Also if this is the case please share your scenario as we are looking for ways to make this experience better.

GuardianAngel Apr 25, 2013 at 1:07 PM 
Hi,

we use the new SqlAzureExecutionStrategy and i have seen that there is a flag SupportsExistingTransactions which is default set to false and block using of transactions. In our case we get the exception "Existing transactions are not supported with the current execution strategy.". For our application we need transaction. So my question is why transaction is not supported with SqlAzureExecutionStrategy or how can i use transaction with SqlAzureExecutionStrategy?

AndriySvyryd Apr 18, 2013 at 10:31 PM 
@fhurta When talking about the EntityClient we are referring to using its API directly. LINQ to entities and almost any other methods will support connection resiliency.

fhurta Apr 18, 2013 at 9:52 AM 
Question to following statement from the post:
"What database operations are not automatically retried:
Executing queries in EntityClient, e.g. using EntityCommand"

So speaking about database-first approach. As EntityClient is default data provider for EF, does it mean that using e.g. LINQ to entities will not do retries until I change the connection string - providerName attribute from System.Data.EntityClient to some another provider?

cincura_net Mar 13, 2013 at 2:45 PM 
No real reason/advantage. I was just asking.

AndriySvyryd Mar 12, 2013 at 7:20 PM 
@cincura_net In this case a generic method doesn't seem to give any advantages, so we chose the simplest form. If you can think of a reason why a generic method would be better please let us know.

cincura_net Mar 11, 2013 at 5:01 PM 
Hi *,

I know it's basically the same, but did you considered also instead of "ShouldRetryOn(Exception ex)" create "ShouldRetryOn<TException>(TException ex)"? Similar for "GetNextDelay(Exception lastException)". Just curious.

JC

AndriySvyryd Mar 6, 2013 at 6:46 PM 
@juliandominguez IExecutionStrategy does support async, the async methods were ommited from the spec to reduce noise.

We assume that the implementations of IExecutionStrategy, IRetriableExceptionDetector and IRetryDelayStrategy implementations are statefull and therefore create a new instance of IExecutionStrategy implementing class for each action that can be retried, it in turn can create new instances of IRetriableExceptionDetector and IRetryDelayStrategy.

We have looked a Topaz, but we decided to have a more streamlined API since we would only need a subset of its functionality.

juliandominguez Mar 1, 2013 at 1:15 AM 
First of all, this is great news, and I strongly agree that the user needs to explicitly tell the code to do retries, and it should not happen automagically, and potentially block a UI that was not meant for waiting very long.
It does not look that IExecutionStrategy supports async (Tasks), and would just retry synchronous calls. In a highly concurrent scenario you would not want to block the threads for doing DB calls, and even less would want to block a thread when waiting for the next retry to occur. Highly concuncurrent scenarios is where you would typically get throttled the most anyway, so I'd believe having a way to retry async calls is a must.
Also, is the IRetryDelayStrategy instance that you use stateful? Or how would you implement an incremental or exponential backoff strategy if it's stateless? (because from the looks of the API, it does not receive how many retries it did before to be able to calculate the next delay).
Have you looked at Topaz for input? http://msdn.microsoft.com/en-us/library/hh680901(v=pandp.50).aspx

divega Feb 28, 2013 at 6:39 AM 
Hi Unai,

Good catch! The method is currently defined only in SqlAzureDbConfiguration but we are planning to move it up to DbConfiguration directly after Alpha 3. We are discussing the usefulness of keeping SqlAzureDbConfiguration as a shortcut to configure an execution strategy that is optimized for SQL Azure but potentially for other cloud oriented optimizations we might implement in the future.

UnaiZorrilla Feb 27, 2013 at 9:53 PM 
Hi,

Why not a SetExecutionStrategy(...) method in DbConfiguration???