Entity Framework Design Meeting Notes

The Entity Framework team has a weekly design meeting in which we discuss/recap design and other issues with the codebase. These are the notes from those meetings. The intention is to provide a history of what decisions have been made and why. No attempt is made to go back and update notes from older meetings if we later change a decision and decide to do something different.

November 8, 2012

Lightweight conventions

After playing with the lightweight convention APIs there was a feeling that the APIs could be made:

  • Cleaner/simpler
  • Most consistent with existing APIs/patterns
  • Provide better support for DRYer code in common cases

Specific changes:

  • One of the nested closures can be removed if the lightweight APIs follow more of the fluent/builder pattern rather than always creating something that is added to the conventions list. To do this, there needs to be two entry methods: one for entities and one for properties. One consequence of this is that putting these methods on Conventions, which follows the collection pattern in a similar way to Configurations, would be mixing the two patterns. Therefore, the plan now is to move back to hanging these methods directly off the model builder. For example:
    modelBuilder.Entities()
        .Configure(c => c.HasKey(“Key”));
    modelBuilder.Properties()
        .Configure(c => c.HasPrecision(5, 5));
    
    • This now makes it not completely obvious that these are conventions and not configurations. We could add the word “conventions” to the method names to make this clearer, but we don’t believe that the distinction is relevant for the vast majority of uses. In cases where it may be relevant (such as the fact that calling Entities doesn’t add entities to the model, and also, possibly, ordering) it is relatively easy to discover from Intellisense and documentation what the behavior is and how it works.
  • Allowing generic versions of these methods that accept base types or interfaces make the experience much better for cases where such as base type or interface exists. This is very similar to the ChangeTracker.Entries case. For example:
    modelBuilder.Entities<IEntity>()
        .Configure(c => c.HasKey(x => x.Key));
    modelBuilder.Properties<decimal>()
        .Configure(c => c.HasPrecision(5, 5)); 
    
  • The current use of properties inside the configuration should be changed to use methods for several reasons:
    • When the value of the property is read it is not at all clear what this value means. Specifically, it is the value that has been configured, if any. If no value has been configured, then the value is null and not the default value that will ultimately make it into the model. If you want to read the model and perform operations based on this then a model convention is the way to do it.
    • The methods match the existing well-known methods that are already used in the existing fluent API.
    • Chaining of methods can now happen in the same way as it does for the existing fluent API. For the generic entities method this chaining can also be restricted based on type, just as in the existing fluent API. For places where there is not enough information to restrict the chaining is flattened just as it is at the top level. Chaining of methods now means that the overload of IsKey that takes a column order is not really needed—you can just chain a call to HasColumnOrder.
  • The idea of being able to drop down to properties (that was discussed previously) should be included. This allows a convention to filter by entity and then easily do configuration of its properties. For example:
    modelBuilder.Entities<IEntity>()
        .Where(e => e.Name.EndsWith("Foo"))
        .Configure(c => c.Property(e => e.Bar).HasPrecision(5, 5)); 
    
  • It seems reasonably common that the results of filtering in a Where will also be needed in the Configure method. Rather than having to write this code twice or factor it out we are introducing a Having method. For example:
    modelBuilder.Properties<decimal>()
        .Having(p => p.CustomAttributes.OfType<DecimalPrecisionAttribute>().FirstOrDefault())
        .Configure((c, a) => c.HasPrecision(a.Precision, a.Scale));
    
    • Having does two things:
      • If the lambda returns null, then the entity or property is filtered out in the same way as it would be if the predicate in Where returned false.
      • If the lambda returns non-null then application of the convention continues with the non-null return value being passed to the Configure method. This means that the Configure method after a Having takes a two argument lambda.
    • Having could allow Where after the Having or it may not. It doesn’t seem super-compelling, but it also doesn’t seem compelling to leave it off.
    • When now becomes Where again to fall in line with the Having pattern. It is possible to have a dangling Where or Having; we will either throw or do nothing for these cases.
    • We considered special casing IEnumerables in Having such that returning an empty enumeration is equivalent to returning null, but this adds complexity without much value.

Connection resiliency to transient failures

Goals

  • Good out-of-box experience with SQL Azure
    • Mechanism is pluggable and can be associated with providers to allow it to be used in other situations

Non-Goals

  • Fully supporting existing EF applications without changes being made to those applications
    • Resiliency is off by default, although we will try to help people know about it by proving information in transient failure exceptions

Scope

  • DbContext API (SaveChanges, queries, possibly commands)
  • ObjectContext API as needed
    • Often implementation will be at the ObjectContext layer anyway
  • Ability to apply the retry policy on other methods manually

Assumptions

  • SQL Azure can drop connections due to:
    • Excessive resource usage
    • Long-running transactions
    • Failover and load balancing actions
    • Federations rebalancing
    • Network connectivity issues
  • Transaction issues:
    • SQL Azure does not support distributed transactions
    • SqlClient does not support nested nor parallel transactions
    • Transient failures cause connection to close and the transaction to rollback

Proposal

  • SaveChanges
    • Create and use one local transaction for all commands as there is no way of dividing them in retryable chunks while keeping the ability to rollback all of them if a non-transient error is encountered
  • Queries
    • Buffer the results (at DbDataReader level) before returning them if retry policy is enabled.
      • We will likely switch to buffering by default due to other benefits
      • Buffering at the data reader level keeps the state manager state consistent even if there is a retry
    • Allow to deactivate the retry policy (and thus buffering) for some queries as they could be too large to fit in the memory (For example, AsStreaming method)
  • Raw SQL execution and other methods
    • Do use the retry policy
    • Always enabled for EntityConnection Open and BeginTransaction
  • Policy behavior
    • Presented as a provider service
    • Can be changed using Service Locator
    • User can get the current policy for the current provider
    • Only retries on transient errors, can be overridden
    • Throws when enabled and Transaction.Current is not null; may be possible to override this
    • Can configure streaming/buffering on a per-query basis
    • Algorithm for determining the delay between retries can be overridden
    • If retry limit reached throws an exception suggesting to break operations into smaller transactions,  this can be turned off
    • Don't provide a Retrying event to keep RetryPolicy immutable. Tracing feature should be enough to cover most use cases. Alternatively we could add a callback as a constructor parameter.
  • SQL Azure specific retry policy
    • By default is not enabled
    • When enabled will retry for SaveChanges, tracking queries and notracking queries, but they can be opted-out individually.
    • Exponential wait period increasing algorithm with a small random factor. First delay is close to 0.
    • 5 retry limit ( about 1 minute delay in total), consider 15 secs

Open Issues

  • Should the default streaming be at the data reader level or at the ObjectResult level?
    • Start with data reader level; this will use more memory but means buffering is only used at one place and behavior is more consistent with current behavior
  • Should we wrap transient exceptions with a more helpful message if SQL Azure retry policy isn't enabled?
    • We should try this, making sure to keep the inner exception the same for people currently using bespoke retry code
  • Possible breaking change (Different merge options, interrupted streaming, etc.)
  • Should NoTracking, SqlQuery (DbSet version and/or Database version) and SqlCommand methods use the policy?
    • Yes for all queries
    • For commands more work is needed to understand the implications of introducing a transaction

October 25, 2012

Change default mapping for DateTime

Currently Code First maps DateTime properties to the SQL datetime type. This is a problem because:

  • DateTime is a value type, which means if no value is set for the DateTime property then the default value is used rather than null as would be the case for reference types
  • The .NET default value for DateTime does not fit into a SQL datetime column
  • This results in an exception that is not easy to interpret: “The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value.”
  • All of this happens just by creating a new object and trying to save it, and it’s not clear at all what the problem is.

The solution is to change the default mapping from datetime to datetime2. However, this is a breaking change because it changes the model that Code First generates, meaning that our model checking (using Migrations) will indicate that the model has been changed even when the user didn’t make a change. This could result in runtime exceptions.

Options:

  • Make the change and direct people to solutions for either migrating their databases or reverting the mapping using an attribute or the fluent API
    • Do this, but allow older model builder versions to use the existing mapping. This is a bit tricky because it is provider-specific behavior.
  • Do nothing and keep letting people know how to fix the issue. The workarounds are easy once the problem is understood. Bing finds solutions quite quickly.
  • Consider improving the exception message, possibly by detecting the situation in the provider and throwing from there instead of letting the server throw.
    • Need to be careful to only throw when really needed. For example, if server really has datetime2 even though SSDL has datetime, then we don’t want to throw. It’s pretty much impossible to know this.

Decision:

We will leave the behavior as is. The breaking change and the confusion it will cause is not justified. Existing workarounds are reasonable and easy to find with a Bing search. Also, with lightweight conventions it should be easy to add a convention that changes the mapping for all DateTime types if desired—we will use this as an example for lightweight conventions.

Lightweight conventions

There are a couple of minor issues/questions with the lightweight conventions API:

1. It is possible to call Configure multiple times in one Add call. What should happen? Throw? Last wins? Likewise, it is possible to call Where without anything following it. Again, what should happen? Conclusion: change the API such that Where becomes When and is always chained after configure. For example:

    modelBuilder.Conventions.Add(
       entities =>
           entities.Properties()
               .Configure(config => config.Precision = 10)
               .When(property => property.PropertyType == typeof(decimal))
    );

With this pattern if multiple calls to Configure are placed in a single Add block it will act like calling Add multiple times. Also, When comes after Configure so there are no dangling Wheres.

2. Should it be possible to chain multiple calls to Add? Decision: no compelling argument either way, but for consistency and to make it easier to change in the future we will currently keep a void return and prevent chaining. One option for the future is to return something that can then be used with Remove to remove the convention.

General note: the Code First configuration APIs should be consistent in this as much as possible.

DbConfiguration exploratory testing

Discussion of experience following DbConfiguration exploratory testing:

  1. EF does not use the default connection added by ASP.NET. This can be confusing. However, EF should use the connection if ASP.NET scaffolding is used—more testing to be done here. Also, the One EF tooling experience could configure the context to use the ASP.NET connection if present. (http://entityframework.codeplex.com/workitem/624) We can’t really change EF to use the default connection in the general case without it being an unacceptable breaking change.
  2. It is not always obvious that settings in the config file are overriding settings made in code-based configuration because making the connection between the XML and the code is not super-obvious.
    1. Could we only have the config file override the code-based if a “force” flag or similar is used? This is a bit ugly and would be a change to the current EF5 behavior.
    2. Should we stop manipulating the config file from NuGet and instead generate code-based configuration always? What about apps that already have existing config files? We will look into this as part of the other work being done on configless EF.
    3. We should definitely update the DbConfiguration documentation to make the connection between the code-based configuration and the config file clearer. (http://entityframework.codeplex.com/workitem/558)
  3. Filed bug “Any DbContext.Configuration change in DbInitializer persists for the first SaveChanges, but not the consecutive ones” (http://entityframework.codeplex.com/workitem/608)
  4. People don’t know what the provider manifest token is. Should we rename it? Problem is that it is part of the XML schema so we end up breaking/complicating the schema or we have two names for it. Most people don’t need to know what it is anyway, so may be okay to leave as is.
  5. Provide additional documentation on what the DbConfiguration methods do. (http://entityframework.codeplex.com/workitem/374)

Code Contracts discussion

Investigation of using Code Contracts static analysis showed that it is possible to run static analysis after recent bug fixes. The results were:

  • Took seven hours to complete and used 4GB memory
  • Attempt to use static analysis caching caused crash
  • Static analysis was able to make some assumptions about pre/post conditions to use for the analysis:
    • 8000 suggested pre-conditions
    • 23000 suggested post-conditions
    • 27 possible errors—these will be investigated (http://entityframework.codeplex.com/workitem/625)
    • There is not enough information in these suggestions to use them as a basis for creating more explicit conditions

What are the benefits/costs of continuing to use Code Contracts?

  • We don’t use much more than simple pre-condition checking right now; we could do more, but it’s not a priority to do so so it seems unlikely that we will
  • Static analysis is too slow/buggy to get much benefit
  • One of the main reasons for using it was to avoid coding the manually adding the parameter name to simple null/string checks. We can now do this with the CallerXXX attributes instead.
  • We could ship a contracts assembly, but we don’t currently and there doesn’t seem to be much demand for it.
  • Contracts automatically compiles out internal stuff for release builds. This is nice, but we could live without it by leaving the internal checks intact or by conditionally compiling them.
  • Build is slow because of the re-writing. We could try to make it faster by doing less or do on release builds only, but neither of these options are ideal.
  • In general, we have run into several bugs (e.g. compiling out public surface contracts when it shouldn’t, creating unverifiable code, crashing on caching) that reduce the value proposition.

Conclusion: we will talk to the Contracts team and whether to use Code Contracts and if so how to get the best value from them.

October 11, 2012

Lightweight conventions

[Aka set-based configuration/batch configuration/bulk configuration. Given where the design is evolving we’re back to calling these lightweight conventions again.]

As previously stated, it is important that explicit configuration of an entity/property should take precedence over batch/bulk/set configuration. Also, unlike single entity configuration, using batch/bulk/set configuration does not add types to the model. Finally, not all types in the model have configurations associated with them at the time that the calls to the API is made, which makes the implementation as configuration changes hard. All this together means that they fit both the mental model for conventions better than configurations and are probably best implemented as conventions. We are therefore leaning towards making the Conventions.Add call be the entry point and calling these lightweight conventions.

It is important that when using lightweight conventions the developer doesn’t have to figure out where to insert the conventions in the convention list—they should just have to call Add and it will work.

The following table shows mock-ups of how various scenarios are handled by the existing, single-entity fluent API, by the full conventions infrastructure, and by possible options for lightweight conventions. Notes are below the table.

Fluent API/Full conventions Lightweight conventions

modelBuilder
    .Entity<MyEntity>()
    .Property(e => e.MyDecimal)
    .HasPrecision(8, 4);

 

// Full convention
public void Apply(
    PropertyInfo property,
    Func<DecimalPropertyConfiguration> configuration)
{
 
   configuration().Scale = 4;
}


 

//Casting (See {Note 1})
modelBuilder.Conventions
.Add(
    entities => entities
        .Properties()
        .Configure(
            propConfig =>
            ((DecimalTypeConfig)propConfig).Scale = 4));

//OfType method          
modelBuilder.Conventions
    .Add(
        entities => entities
            .Properties()
            .OfTypeDecimal()
            .Configure(propConfig => propConfig.Scale = 4));
 
//Type in root property method
modelBuilder.Conventions
    .Add(
        entities => entities
            .DecimalProperties()
            .Configure( propConfig => propConfig.Scale = 4));

//Flatten config so all possibilities are on all configs
modelBuilder.Conventions
    .Add(
        entities => entitites
            .Properties()
            .Where(p => p.PropertyType == typeof(decimal))
            .Configure(config => config.Scale = 4));

modelBuilder
    .Entity<MyEntity>()
    .HasKey(e => e.Key);

// Full convention
public void Apply(
    Type type,
    Func<EntityTypeConfiguration> configuration)
{
    var keyProperty = type.GetProperty("Key");

    if (keyProperty != null)
    {
        configuration().Key(keyProperty);
    }
}

See {Note 2}

modelBuilder.Conventions
    .Add(
        entitites => entitites
            .Properties()
            .Configure(config => config.IsKey());

modelBuilder
    .Entity<MyEntity>()
    .ToTable("MY_ENTITY");

// Full convention
public void Apply(
    Type type,
    Func<EntityTypeConfiguration> configuration)
{
        configuration().ToTable(
            Regex.Replace(
                type.Name,
                "([a-z])([A-Z])",
                "${1}_${2}").ToUpper());
}

modelBuilder.Conventions
    .Add(
        entities => entities
            .Configure(
                config => config
                    .ToTable(
                        Regex.Replace(
                            config.ClrType.Name,
                            "([a-z])([A-Z])","${1}_${2}")
                            .ToUpper());

modelBuilder
    .Entity<MyEntity>()
    .Property(e => e.MyDateTime)
    .HasColumnType("datetime2");

// Full convention
public void Apply(
    PropertyInfo property,
    Func<DateTimePropertyConfiguration> configuration)
{   
        configuration().ColumnType
            = "datetime2";
}

See {Note 3}

modelBuilder.Conventions
    .Add(
        entities => entitites
            .Properties()
            .Where(p => p.PropertyType == typeof(DateTime))
            .Configure(c => c.ColumnType = “datetime2”);

//entitites.Where().Ignore();

//entities.Properties.Where().Ignore();

entities.Where().Configure(c => c.Ignore()); // See {Note 4}

entities.Properties().Configure(c => c.Ignore());

modelBuilder
    .Entity<MyEntity>()
    .Property(e => e.MyString)
    .IsUnicode(false);

 

// Full convention
public void Apply(
    PropertyInfo property,
    Func<StringPropertyConfiguration> configuration)
{
        configuration().IsUnicode = false;
}


 

modelBulder.Conventions
    .Add(
        entities =>  entitites
            .Properties()
            .Configure(config => config.IsUnicode = false);

Notes:

  • {General notes}
    • It is okay for lightweight conventions to not cover all possible cases; full conventions provide building blocks for that.
    • The Conventions.Add method is now overloaded to take something like ConventionBuilder instance.
    • Do we even need the Configure method or should we have a fluent API after Properties()/Entities()?
      • Decision: Keep the Configure method; this is a fairly advanced API anyway and it makes doing multiple configurations in one call easier
    • Should configuration members be methods like fluent API or more traditional properties?
      • Decision: Fluent API is write-only so methods make sense. This is read-write, so use more traditional approach.
    • The configuration passed to the lightweight convention must handle not overwriting explicit configuration. The developer should not have to do this.
      • This can be done by using a proxy for the configuration that checks whether or not configuration has already been explicitly set.
  • {Note 1} Should the API provide the ability to filter by property type?
    • Using a generic doesn’t really work because, for example, there is no way to use a generic constraint to change the API based on something like “decimal”
    • Using method names like OfTypeDecimal or DecimalProperties could work, but it doesn’t buy very much simplicity and also makes it harder for configuration that could be applied to multiple types of property but not to all.
    • Decision: use flattening and make all configuration options available on all properties regardless of type
      • If configuration can not be applied to property (e.g. Precision to string property) then it will be a no-op
      • If configuration can be applied but it results in an invalid model then the invalid model will signal the error
      • If configuration can be applied to multiple properties but only properties of a certain type should be configured (e.g. MaxLength to strings/binaries) then developer will have to filter with Where or similar.
  • {Note 2} Normally keys are configured from the entity, not on the property. Do we want to do the same here?
    • Configuring on the entity allows composite keys with order to be easily configured.
    • But common case for convention is to make any property that matches a pattern (e.g. ends with Key) be a key. This is harder to do if API is based off entity because property matching would have to be done fore every property in entity.
    • Decision: consider this to be like adding the KeyAttribute annotation. It is therefore on the property and the common case is easy. Composite key configuration can be done explicitly or with a full convention.
      • Also, allow IsKey method to take an index for ordering of composite keys. This means if every entity has something like a federation ID that is always the first then it can still be done with the simple API
      • Potentially the With method could make this easier without needing IsKey for the property; Diego to come up with ideas.
  • {Note 3} Where
    • Should we have Where method or make all filtering be done in Configure method?
      • Decision: For now we will keep Where method; it really helps the experience.
    • Should Where take PropertyInfo or configuration object?
      • Decision: PropertyInfo since this is more intuitive and matches better to existing mental model for Where
    • What about using a method like With instead that would allow the results of evaluating the Where predicate to be used in the closure?
      • Diego to share ideas on this
  • {Note 4} Should Ignore for an entity be top-level or on the configuration
    • Probably on the configuration, but model discovery implications of Ignore may make this hard to implement either way

October 4, 2012

Global spatial provider

The initial spatial support in EF5 added support for spatial types backed by provider-specific native spatial libraries. This should work in most cases with providers other than SQL Server. However, creating stand-alone DbGeography and DbGeometry types for other providers was not possible because there was no way to register the other provider with EF. This has been fixed in EF6 by allowing the registration of a global spatial provider. However, this is not an ideal solution because it means that only one spatial provider can be used in app domain. A better solution would be to change the design of DbGeography and DbGeometry to decouple them from the native provider except when necessary, and when it is necessary to this in a way that the provider to use is explicit. It would also be worth considering using the System.Spatial library and deprecating DbGeography and DbGeometry.

Decisions:

  • Updating the spatial work is not high enough priority for us to work on now but we will create a work item for it.
  • We will retain the global provider, and also check in the ability to set the global provider in the config.

Set-based configuration

There are not very many, if any, parts of the API for single types where there is no scenario for a set based application. So our initial plan will be to assume that we are implementing all, or at least the majority, of the current Fluent API but with a set based approach to apply configuration to a group of entities, complex types or properties.

The table below shows side by side syntax for configuring some common scenarios, one doing it to a single thing the other to a set. The syntax is based on some of the feedback and discussions from last week; you would be able to pass a lambda into the Entities and ComplexTypes methods to filter the types you are applying it to.

Scenario Single All
Set decimal property precision on entity

modelBuilder.Entity<Blog>()
    .Property(x => x.DecimalProperty)
    .HasPrecision(18, 8);

modelBuilder.Entities()
    .Properties<decimal>()
    .HasPrecision(18,8);

Set Key of entity

modelBuilder.Entity<Blog>()
    .HasKey(x => x.BlogKey);

modelBuilder.Entities()
    .Properties<int>(x=>x.Name.EndsWith("Key"))
    .IsKey();

Configure property on complex type

modelBuilder.ComplexType<Address>()
    .Property(x => x.DecimalProperty)
    .HasPrecision(10, 2);

modelBuilder.ComplexTypes()
    .Properties<decimal>()
    .HasPrecision(10,2);

Given this approach if I want to define all decimal types have a specified precision, regardless of if they are a complex type or entity, and then I need to define it twice. Which is not as DRY as it could be.

Three options:

  • Do nothing, and tell people to write a property convention if they really want to do it in a single place. This still scatters code a little bit, as the convention will be in a different location to the rest of the code.
  • Define a third root to use for when you are configuring both entities and complex types. This would look something like this:
        modelBuilder.Types() 
            .Properties<decimal>()
            .HasPrecision(18,8);
            //Types is not a really good name, but it will do for now.
  • Implement a method similar to Linq Concat. This could look something like this:
        modelBuilder
            .Entities().Concat(modelBuilder.ComplexTypes())
            .Properties<decimal>()
           .HasPrecision(18,8);
  • Have a top-level Properties method

Decisions/ideas from the meeting:

  • The vast majority of complex types are used as properties on entities. This means that when you configure something like precision for all properties of an entity you are also configuring for all properties of the complex type on that entity, and on child complex types of the complex type, and so on. This means we probably only need to have Entities at the top level—we’ll start with this.
  • A good name to use with the nested closure is “Configure” with configuration object and property passed in.
  • We’ll create an initial one-pager for this work and post to CodePlex

September 27, 2012

Dependency resolver scoping

It is possible that in the future EF will need to resolve services within a given scope and release (dispose) of those services when the scope ends. The obvious example of this are services that are used by a given context instance and then released when the context is disposed.

This kind of scoping can be further broken down as follows:

  • The ability to know when a scope is starting and when it ends. This is important when using IoC containers because they often have mechanisms for beginning and ending scopes which are designed to handle the lifetimes of services in the scope.
  • Knowledge about the type of scope being created. For example, is the scope for a context lifetime, for a connection lifetime, or for something else that we haven’t even thought of yet?
  • Specifics about the scope. For example, services might need to be resolved one way for BlogContext, but a different way for MembershipContext.

Previous ideas

The Release method was intended to solve part of this problem—namely the releasing (disposal) of services when they go out of scope. However, it doesn’t integrate well with IoC containers (as per feedback from MVC team) and doesn’t cover the other aspects of scoping. We also never currently call Release for any service. Therefore the Release method will be removed.

The WebAPI mechanism for scoping doesn’t provide any information about the scope and also adds complexity to the dependency resolver interfaces. It also assumes only one dependency resolver, rather than the chain of resolvers that we have. It seems inappropriate to follow the WebAPI design given that we don’t even need scoping at this point and that the WebAPI design doesn’t meet the general requirements we have.

Prototype/proposal

The current proposal is to do nothing now (YAGI principle) but to have prototyped a mechanism that can work if/when we need it.

The mechanism of this proposal/prototype is to use the IDbDependencyResolver against itself as the general mechanism for resolving scopes. For example, let’s say that EF needs to resolve services scoped by a context instance. To do this the context instance will ask the global dependency resolver for a new resolver and store that resolver:

    _resolver = DbConfiguration.GetService<IDbDependencyResolver><idbdependencyresolver>(context.GetType());

We would have a default implementation of this in the root resolver as we do for other services. Someone using an IoC container would add a resolver that uses the IoC container’s scoping mechanism to return a new resolver. For example:

public class UnityResolver : IDbDependencyResolver, IDisposable
{
    private IUnityContainer _container;

    public UnityResolver(IUnityContainer container)
    {
        _container = container;
    }

     public object GetService(Type type, object key)
     {
         if (type == typeof(IDbDependencyResolver)
             && typeof(DbContext).IsAssignableFrom(key as Type))
         {
             return new UnityResolver(_container.CreateChildContainer());
         }

         return null;
     }

     public void Dispose()
     {
         _container.Dispose();
     }
}

Notice that the key object is used to provide information about scope that is being created. In this case we are passing the context type name which indicates both that this is a scope for a DbContext instance and also the name of the context type. The key used could be anything that makes sense for the scope being created so long as it is well-defined.

After the context has obtained the scoped resolver it can be used to request scoped services in the normal way:

_cacheKeyFactory = _resolver.GetService<idbmodelcachekeyfactory>();

The code that created the scoped resolver can choose what to do when services are not resolved by the scoped resolver. In some cases it may be appropriate to ask the global resolver to resolve the service. In other cases, especially when the service must be disposed, it may be required that the scoped resolver always resolves the service and the caller will throw if that doesn’t happen.

When the scope ends (e.g. the context is disposed) then the scoped resolver will also be disposed if it implements IDisposable.

public virtual void DisposeContext()
{
    var disposableResolver = _resolver as IDisposable;
    if (disposableResolver != null)
    {
        disposableResolver.Dispose(); ;
    }
}

Additional notes:

  • General proposal accepted; we won’t implement anything now.
  • Release method will be removed.
  • If we do do this, then consider
    • Having a way to call the root scoped resolver if services are not resolved by the resolver in use
    • Adding an explicit interface that also includes Dispose or equivalent since there is nothing in the API that requires the scoped resolver to return an IDisposable
    • Sugar methods for registering a scoped resolver

Sugar methods for services

The fundamental building block for configuring EF is to add a dependency resolver on DbConfiguration. For example, the following sets a default connection factory when used in your DbConfiguration constructor:

AddDependencyResolver(new SingletonDependencyResolver<IDbConnectionFactory>(new SqlConnectionFactory()));

The question is whether or not and to what degree we should provide “sugar” methods that simply this in some cases. Sugar methods make it easier to configure EF without knowing anything about the dependency resolver mechanism.

This breaks down further in two ways: general purpose sugar methods and service-specific sugar methods.

General purpose sugar methods

Many of the registered services are effectively Singletons. We can make this easier by providing methods that allow registration of Singletons, or indeed other types of lifetime such as transients and thread locals. For example:

RegisterSingleton<IDbConnectionFactory>(new SqlConnectionFactory());

We currently have the following lifetime types:

  • Singleton: the same instance is returned every time GetService is called
  • Transient: a new instance is returned each time GetService is called
  • Thread local: the same instance is returned every time GetService is called for a given thread, but a new instance is used for each thread

Service-specific sugar methods

In cases where we want more discoverability and documentation points for certain services we can provide specific methods for those services. For example:

SetDefaultConnectionFactory(new SqlConnectionFactory())

In particular, we may choose to do this for places where:

  • Existing methods have been obsoleted (currently only default connection factories)
  • It is very important to make setting the service discoverable and/or documented (e.g. registering an EF provider)
  • We anticipate that setting the service will be very common

We should also choose in such cases whether or not a getter method is also useful.

We currently resolve the following services:

  • IDatabaseInitializer<TContext> (for each TContext)
  • MigrationSqlGenerator
  • DbProviderServices
  • IDbConnectionFactory
  • IManifestTokenService
  • IDbCommandInterceptor
  • IDbProviderFactoryService
  • IDbModelCacheKeyFactory

Decisions:

  • We will add service-specific sugar methods for all services. This will improve discoverability and documentation and allow people to configure EF without an increase in concept count.
  • Given that we will have service-specific methods we do not need to have the general purpose methods.

Simplified API for common conventions

The idea here is to provide a simple way for developers to change/add common conventions without needing to understand the full pluggable conventions API. Some initial API ideas:

//Set Decimal precision, column type not necesary
modelBuilder.AllEntities()
    .Properties()
    .OfTypeDecimal()
    .HasPrecision(25,10)
    .HasColumnType("Decimal");

modelBuilder.AllEntities()
    .Properties()
    .OfTypeDecimal()
    .Where(d => d.Name == "DecimalProperty")
    .HasPrecision(18, 10);

//Set all guids with Key at the end of their name to be a key.
modelBuilder.AllEntities()
    .Properties()
    .OfType<guid>()
    .Where(x => x.Name.EndsWith("Key"))
    .IsKey();

//Convert CamelCase class names to tables with lowercase names with underscores
modelBuilder.AllEntities()
    .ToTable(x => Regex.Replace(x.Name, "(?<=[a-z])(?<x>[A-Z])|(?<=.)(?<x>[A-Z])(?=[a-z])", "_${x}").ToLower());

//Add TableForBlog to the end of the blog table, but only for the Blog entity. 
modelBuilder.AllEntities()
    .Where(x => x.Name == "Blog")
    .ToTable(x => x.Name + "TableForBlog");

modelBuilder.AllEntities()
    .Properties()
    .Where(x => x.Name.EndsWith("Ignore"))
    .Ignore();

Notes:

  • AllEntities or similar method is a good way of covering a lot of the common cases for replacing conventions without needing to actually know about conventions
  • Consider using nested closure pattern instead of new bespoke fluent API
  • Implementation ideas:
    • Could be implemented as conventions or by applying the configuration directly to all matching entities
    • Order my matter—probably should just choose last wins
    • OfType Should be able to work with non-mapped interfaces and classes

September 20, 2012

By-convention parameter binding SqlQuery overload

This is a proposed contribution by hintzen to add overloads of SqlQuery that accept a single object argument representing the set of parameter values. Binding would happen by convention based on the property names of the provided parameter object.

Decision: We would consider this contribution.
Feedback: The new methods should be overloads of SqlQuery/ExecuteSqlCommand.

SqlCE relative path connection string UX issues with Migrations

Issue: Relative path connection strings in CE resolve to the VS installation folder when running migrations inside VS.
Decision: Considering auto-prepending "|DataDirectory|" as a mitigation. Brice investigating implications for different scenarios.

Pluggable Conventions

Polish the model representation

Should DbModel inherit from DbDatabaseMapping?

  • Initial concerns were around its mutability after calling Compile()
  • Leave it as is (a property on DbModel)

How much should we align with OData’s EdmLib?

  • Not worth adopting code
    • Cost of implementing SSDL & MSL
    • Freedom to evolve our API independently
  • Look at aligning names of types and properties where appropriate

Should we avoid calling it “EDM” and adopt a more EF-specific taxonomy?

  • Yes
    • Remove Edm and Db prefixes
      • Rename any conflicts
    • Change namespace to S.D.E.Model
      • .Database
      • .Mapping (directly under Model)

Should we address the duplication of concepts/code from System.Data.Entity.Core.Metadata?

  • Original idea was that this would replace the old stuff
    • We will probably never get around to doing this work

Need to address public setters on collection properties.

Polish the conventions API

Should we allow greater control over the order of conventions?

  • Is there a way to avoid calling ApplyPluralizingTableNameConvention from inside DbModelBuilder.Build?
  • Need a way to determine the order of default conventions (Documentation)
    • Is there a better way to group things? (e.g. keys before foreign keys)
  • Add AddBefore method
  • Could add conventions immediately after their derived type

Should any of our conventions be sealed? …not publically constructible?

  • No, keep open unless there is a compelling reason not to

Make it easy to override parts of the default conventions (similar to customizing our migrations provider).

The ConfigurationBase hierarchy needs to be simplified. Currently, there is one set that gets returned from the Fluent API and another that you use for writing conventions.

  • Like this because you don't have static types. Similar to having generic and non-generic APIs

CTP5 post mentions “It does not allow you to read information about the shape of your model.” Has this been fixed by making Edm public?

  • Fixed. Ensure there is a way to get to root EDM model from conventions

Are we correctly using Type vs. TypeInfo? (See Evolving the Reflection API)

  • Need to support net40
    • Did TypeInfo exist before net45?

Is there a way to fail during Add() if an IConfiguration will never get applied?

  • Find out specific combinations that don't work
  • May not be able to do anything aside from documentation

CTP5 post mentioned changing the IConfigurationConvention.Apply parameter to not be a Func<tconfiguration>.

  • Like this for lazy creation
  • Could have a wrapper
  • Could detect if configuration changed after Apply and throw it away if not

How does this work integrate with the dependency injection work?

Breaking change

The following code will no longer compile.

PrimitivePropertyConfiguration property = modelBuilder.Entity<MyEntity>().Property(e => e.MyProperty);

This is because PrimitivePropertyConfiguration is now generic.

August 23, 2012

Make ObjectContext implement IObjectContextAdapter

This was requested in the CodePlex discussion as a way to provide a common interface for both DbContext and ObjectContext from which the ObjectContext can be obtained in a common way. Not super-high value, but very easy to do and doesn’t impact the DbContext API at all.

Decision: We will create a work item for this and accept the contribution if it is made. (http://entityframework.codeplex.com/workitem/473)

Migrations multi-tenancy

Enable migrating multiple DbContexts in a single database

  • Problem: HistoryTable is single context
    • Solution? - Allow a single history table to store data for multiple contexts.
      • Hard to differentiate rows – can’t really use context type name as not refactor safe.
      • Can we allow the user to configure the tenant name, possibly with the context name as the initial default?
      • We could get the container name from the model, but this doesn’t work for automatic migrations.
      • Decision: For now we will go with multiple tables and will revisit this if necessary based on feedback.
    • Solution? – Allow multiple history tables.
      • Can already do this with HasDefaultSchema()
      • But should also be possible via renaming.
  • Exploratory testing needed: It seems like we have the correct building blocks but we may need to add sugar to make it easy for app developers to have multiple contexts
    • Consider the case where universal providers has a context/migrations; this should be transparent to the app developer
    • Consider the case where all the user wants is multiple contexts without any other changes—should be easy, possibly even with an F5 experience without a migrations configuration

History table customization

  • Introduce IHistoryContextFactory
    • Registered via DbConfiguration
      • Keyed singleton (key is type of MigrationsConfiguration)
    • Subclass HistoryContext to customize mapping in OnModelCreating.
    • Also useful for provider customization.
      • But probably need to incorporate provider name into resolver key.
  • Exploratory testing needed: Is it possible to change the HistoryRow entity itself? For example adding a new primary key?
    • This can be useful for some backends where the current MigrationID is not an ideal key
    • Could this be done with table-splitting combined with overriding SaveChanges?
  • It would be great if the migration provider could hook into the history table configuration
    • So, for example, just installing the provider makes the history table work without any additional intervention

 

Migration code file organization within project

  • Could try and support multiple sets within single Migrations folder (namespace).
    • But likely not worth it given:
      • Should already be able to have multiple “Migrations” folders in a single project. (via –ConfigurationTypeName in PS)
        • Seems to be a bug when adding migrations – they all go into the same folder
      • Should also accept unqualified config type name as shortcut.
      • Can also use multiple projects as a way to organize things.
    • So, keep it simple and listen for feedback.
  • Exploratory testing needed: What’s the experience like with multiple folders?

Designer versioning

Previous versions of the designer assemblies used version numbers based on Visual Studio versions (10.1, 11.1, etc.). What should we do with the open-source assemblies?

  • Change the strong names of the assemblies?
    • This allows us to version independently starting where appropriate without clashes
    • Decision: will do this, but need to check for any external dependencies on the assemblies
  • Sematic versioning: Yes
  • Where to start? 1.0? 6.0?
    • Decision: At least initially, start with version that aligns with runtime release (e.g. 6.0)
  • Current plan is for new designer to uninstall and replace old designer
    • Could decide to keep the old and only use the new for EF6
      • Will investigate this if it becomes problematic to support old EDMX/code gen/etc from new designer
    • Should the new designer support VS 2010 as well as VS 2012?
      • For now we will focus on VS 2012

Async protection

DbContext/ObjectContext are not thread safe and making them generally so is prohibitively expensive for the common scenario of short-lived context instances. However, inappropriate use of async support now makes it easy to access the same context concurrently without it being obvious that this is happening—just start using the context again after starting an async task before waiting for the async task to complete.

Can we help protect against this by adding checks that the context is being used while an async task is in progress?

  • Could add a flag that is set whenever an async task is in progress and not reset until it completes
  • Not possible to check this flag in all cases (not available) but could possibly be done in some cases (e.g. SaveChanges)
  • Need to be sure that whatever synchronization mechanism is used cannot result in false exceptions
    • If a task completes and sets the flag but another thread reads a stale version of the flag and throws this would be a serious problem
    • Need to also make sure that perf of doing the check is not prohibitive

August 9, 2012

Getting schema Information from providers

Providers are currently required to do some complex things in order to provide schema information about a database. The goal of this work is to simplify the way provider writers implement this functionality while minimizing the disruption to existing providers.

A one-pager with details will be pushed to the Feature Specifications page soon.

Discussion points/open questions:

  • Should the column class contain an ordinal field or is the order of the enumeration sufficient?
  • We should use the s-space EDM object model from Code First
    • This will need some cleanup before it can be made public, but this needs to be done for public conventions anyway
  • Can we provide a Code First model (i.e. a context and sets) to which providers can provide their own mapping?
    • Easy for some providers; hard for others where there really are not reasonable mappings
    • We can provide a non-context based interface (like a Repository) and then also provide a Code First model to back it. Use of the Code First model will make it very easy for some providers, but the Repository can also be implemented without it where mapping is not reasonable.

Using IDbDependencyResolver with existing IoC containers

An investigation was made into using Ninject as part of code-based configuration and for resolving EF dependencies. The main problems found were:

  • DbConfiguration must be creatable by tooling at design-time and it must have an Ninject “kernel” instance to resolve dependencies.
    • The kernel can be created inside DbConfiguration; this is fine.
    • If the kernel is also needed other places then it will have to be obtained in some way; this is not especially pretty but consensus is that it’s probably okay.
    • Need to investigate with other IoC containers.
  • Ninject throws if a dependency cannot be resolved.
    • This is pretty standard but doesn’t match the pattern we need which is to return null if the dependency can’t be resolved so that other resolvers in the chain have a chance to do the resolution.
    • Need to investigate further how to handle this—e.g. catch exception—in both Ninject and other containers.

August 2, 2012

CUD Batching

There have been various CodePlex discussions around adding batching support to EF. These have resulted in some useful information and ideas which are being gathered together here with ideas from the team. This then creates a starting point for the EF team and/or the community in implementing this feature.

Currently when executing SaveChanges EF produces and executes discrete DB operations for each CUD operation. However SQL Server and some other backends provide a way of sending a set of operations as a single unit which may result in better performance, especially in situations where the connection to the database has high latencies.

This is the most voted issue on CodePlex with 14 votes and a close second on UserVoice with 1492 votes.

To support batching we would need to solve two problems:

  1. Enable batching in the SQL Server provider
  2. Change the provider model to enable different batching implementations

SQL Server

There are several approaches we could consider:

  • Append all operations in a single command text.
    • If server generated values are present we could either
      • Preserve the order of the operations and output the values in different result sets
      • Alter the operations to output the generated values to a temporary table and then query that values there in a single statement
    • This option can “break” plan caching
      • Not clear that this will be a real problem in the wild
      • Benefits of batching will likely outweigh it, especially in high-latency scenarios
      • Is it possible to disable plan caching? Is it worth it?
      • Might be a reason for allowing batching to be switched off
  • For inserts it is possible to use a single statement (INSERT INTO … OUTPUT INSERTED … VALUES)
    • We need to do some prototyping to get perf measurements to see how much of an advantage this would provide
  • Use SqlDataAdapter
    • It seems the only public way to do this is with DataSets which is a dependency we should not take and may not provide real perf improvements anyway

Provider Model

Other backends might be restricted to one of the above options or have a different way of achieving this. We need a flexible model to accommodate them.

If any of the operations rely on a server generated value from a different operation they usually can’t be batched together. We can deal with this by either:

  • Splitting the batches in the update pipeline before sending to the provider
    • This is basically not dealing with the problem, but could be a first step
  • Add the notion of temporary keys to command trees
    • This requires the provider to understand temp keys and key propagation, which is non trivial
  • Add client key generation strategies: GUID, sequences, hi-lo, etc.
    • When using key generation strategies the key propagation is handled by the state manager before sending anything to the provider which means that we would not need to split the batches

Different providers may have different limits on what they can batch. This means that the provider must be able to split the batch independently. Some options:

  • Send one update at a time to the provider.
    • The provider may choose hold onto the update or send it to the database batched with previous updates.
    • If it sends the updates then it will return information back to us in terms of multiple result sets.
    • We will also tell the provider when we have no more updates to send so that the provider can finish the last batch.
  • Send one update at a time to the provider with the provider using an event to give information back when it decides to send the batch to the server.
  • Send all updates to the provider at one.
    • This could be a push from EF to the provider or allow the provider to pull updates from EF
    • The provider returns one or more data readers with multiple result sets back to EF

Open questions

  • Where MaxBatchSize should be exposed?

Possible classes for one-at-a-time approach (async version):

abstract class DbBatchCommand : IDisposable
{
    DbTransaction Transaction
    DbConnection Connection
    int Timeout
    DbBatchResult AddAndExecute(DbCommandTree) 
    Task<dbbatchresult> AddAndExecuteAsync(DbCommandTree)
    DbBatchResult Execute()
    Task<dbbatchresult> ExecuteAsync()
}

class DbBatchResult
{
    bool Executed
    bool HasReader
    int RowsAffected
    DbDataReader Reader
}

abstract class DbProviderServices
{
    DbBatchCommand StartBatch()
}

Bulk operations

There have also been CodePlex discussions about bulk operations. The idea here is to improve the perf of updates and deletes by providing a way to perform many of them in the server without first having to bring the entities into the context. Some suggestions for how the code might look are:

var user = new UserInfoFields(); 
var update = user.Update().Set 
( 
   user.Field1 = 1, 
   user.Field2 = "xxxx" 
).Where(user.Name == "Jim" && user.Enable); 

update.Execute();

context.Employees 
    .Where(e => e.Title == "Spectre") 
    .Update(e => new Northwind.Employee 
    { 
        Title = "Commander"
    });

There are two important questions that need to be answered about a bulk operation implementation:

  • Should calling the API cause the updates to happen immediately or should they happen when SaveChanges is called?
    • Deferring until SaveChanges is called at first seems to match existing EF behavior. However, SaveChanges is currently only concerned with writing changes that have been detected in tracked entities. For bulk updates to happen as well the state manager would have to track which bulk operations are also pending, which is a significant change in both mental model and implementation.
    • In addition, the APIs feel like they should send updates immediately, so deferring until SaveChanges could be unintuitive.
    • Decision: Use immediate execution
  • Should performing a bulk update affect local entities being tracked by the state manager?
    • If local entities are not touched then it would be very easy to have the state manager get out-of-sync with the database resulting in unexpected behavior on subsequent uses of the context. In other words, it would be easy for people to shoot themselves in the foot.
    • The problem is that it is not in general possible to know exactly what changes will be made to the database such that these changes can be reflected.
    • This could be due to the database being out of sync with the context or because of semantic differences in how we interpret the query compared to how the database interprets it—for example, string compare differences.
    • Decision: While it is not possible to be sure that the local changes will exactly match the database changes it seems that we may be able to get close enough to avoid most foot-shots. We should aim for this.

July 26, 2012

Async

  • Should we move all public async methods to a separate namespace as extension methods so they don’t pollute Intellisense?
    • Decisions:
      • We will leave async methods in S.D.E. primarily to reduce the number of namespaces that people will need to import.
      • Async versions of methods that LINQ to Entities does not support will be removed.
  • Should we remove the Create methods from IDbAsyncQueryProvider and derive it from IQueryProvider?
    • Decision: Yes
  • I changed Database.SqlQuery to return DbSqlQuery so that it could be enumerated asynchronously (without tracking) and DbSet.SqlQuery will now return DbSqlSetQuery that derives from DbSqlQuery.
    • Is this the right hierarchy and naming?
    • Should we provide singleton async methods on DbSqlQuery, since it doesn’t implement IQueryable?
    • Decisions:
      • We should not change the name of the existing class—unnecessary breaking change.
      • We need to find a name for the new class—probably will be done as part of polishing later
      • We should add instance async methods in the same way that we did for AsNoTracking

One-pagers

  • Async feedback
    • Current draft looks good; don’t need to list out all API
  • General goals/format
    • The one-pager should remain very high level and provide a short overview of the important points about the feature.
    • The audience is the community who wants an overview of the current state of the feature.
    • The format and sections are not rigidly defined and should reflect the important things the community would need to know. For example, it might include:
      • Goals
      • Non-goals
      • Dependencies
      • Design
      • API Usage
      • Challenges
      • Limitations
    • The document should be periodically updated as development progresses.
    • It will form the basis of a blog post when the feature is implemented to the level that we want to illicit wider feedback from the community.
    • Note that this is not an up-front design document or a comprehensive spec. It’s just a summary of where we are at with the design.

Initializer proposal

A potential contributor has started a discussion about adding a new initializer to our lineup. The initializer would be used when you don’t ever want EF to create a database (even if it doesn’t exist) but do what EF to do a check that the database exists and that the model matches and throw if not. This allows for a fail-fast in al cases where the database is not up-to-date. It should probably also fail if model information cannot be found in the database.

Two questions:

  • This seems like a useful initializer but should it go into EF.dll or is it better off in contrib or elsewhere?
    • Decision: The idea has merit for the core; i.e. we are interested.
  • Should it also fail if model information cannot be found?
    • Decision: Yes; it’s primary a fail fast checker for people using Migrations.
  • Related: The IoC work requires a NullDatabaseInitializer for disabling database initialization (since null means something else). Should this be public?
    • Decision: Yes, it can be made public.

July 12, 2012

Model key caching

When discovering a Code First model DbContext has to decide whether to use a cached model or run OnModelCreating and the rest of the pipeline to create a model. In EF5 the key used for this cache lookup is a tuple of derived context type and provider invariant name. However, sometimes the same context type and provider need to be used for multiple models, such as when the using different schemas for multi-tenancy databases. Allowing the cache key to be injected makes this possible without the need to do custom model building and caching.

The implementation of this makes use of the dependency injection work as follows:

  • Key abstractions identified:
    • IDbModelCacheKey
      • (Equals, GetHashCode)
    • IDbModelCacheKeyFactory
      • Create(DbContext)
  • Create default implementations
    • DefaultModelCacheKey etc.
  • “Invert” control
    • Prefer .ctor injection
    • Can also use “poor man’s DI” to aid testing:

      public LazyInternalContext(IDbModelCacheKeyFactory cacheKeyFactory = null)
      {
          _cacheKeyFactory = cacheKeyFactory ?? new DefaultModelCacheKeyFactory();
      }
      
    • Go directly to the resolver if injection impractical

Notes:

  • Not all dependencies that can be injected need to be exposed explicitly in DbConfiguration. In this case a property was added but we have now decided to remove it. (Work item 373.)
  • The schema is not included into the cache key by default because we would need to run OnModelCreating to get it, and this has perf implications for large models.
    • We could add API to make the schema available without running OnModelCreating, but on balance it seems like we can instead make people aware of how to do it using the mechanism above without adding surface
    • We should blog on how to do this.
  • We need to document which interfaces/base classes the EF will try to resolve (Work item 374.)

Migrations history table schema changes

Background:

  • DbModelBuilder now has a HasDefaultSchema() method to allow the schema that is used in the created model to be changed.
  • Ideally, this should also affect __MigrationHistory table so that
    • it co-located with the other tables in the schema
    • and so that multiple history tables can exist in the same database
  • We then need to be able to migrate the history table along with the rest of the model
  • We probably want to allow further history model configuration in EF6 (table name etc.)

Current implementation:

  • Propagate default schema to history context
  • Include history model metadata in user metadata (transparently)
    • Create/Drop history becomes part of the standard pipeline.
    • Introduce “IsSystem” annotation so we can identify our metadata.
      • Can’t really rely on names

Issues:

  • Handling default schema changes is tricky:
    • “foo” -> “bar”, (Add|Get)-Migration fails because history is in “foo” but we look in “bar”
      • For explicit migrations we could successively try schemas from the code behind metadata.
      • But auto-migrations don’t really work because the current DB metadata is only in the history table!
        • Use Info Schema?
          • Could still find multiple history tables, which one do we use?
          • By design we don’t reflect into the DB from Migrations but we could change that.
  • Existing apps metadata doesn’t contain history metadata.
    • Use IsSystem in the differ to avoid producing false diffs.
    • When updating, inject history metadata into first user migration (in memory) if not present
      SQL Server system objects cannot be moved
  • Do a table rebuild (SELECT * INTO foo.Bar FROM dbo.Bar)

Notes:

  • Adding the history table metadata to the user metadata is fine, but we should try to keep this a valid EDMX documentation to ensure we (or others) can easily parse and understand it in the future. (Work item 375.)
  • Is IsSystem enough or do we need to go with something more specific, such as IsHistory
    • IsSystem is okay as long as it works. Depending on what we do with the system-ness of the history table we could change it.
    • We will likely need to use IsSystem in conjunction with c-space names to ensure we can always find the correct metadata.
    • Putting history info in a different container was considered but would require considerable work in other areas, such as the model differ.
  • How do we find the history table after the schema changes?
    • For explicit migrations we will look in the code-behind.
    • For automatic migrations not only can we not find it but we can’t distinguish the case of not finding it from the case of it not being there because the database is new. This could cause Migrations to re-create all the tables in a new schema while all the existing tables still exist (with data) in the old schema.
    • Options:
      • Considered: Make it so that automatic migrations can only be used with the default schema
      • Considered: Provide some kind of API that forces users to declare a schema change explicitly
      • Decision: Make it so that changing the schema doesn’t automatically move the history table as well; you would have to do this with a manual step
  • Should the history table continue to be a system table?
    • Majority in the room believed that it should be
    • Majority believed that we should not add special code to allow this to be changed in EF6 (Current ways of changing it are okay.)
    • Code must continue to work with either a system table or a normal table
      • Migrating the history table to a new schema must account for the table-rebuild this requires

Async immediate LINQ operators

How should async operators like FirstAsync and CountAsync (that  return single values instead of queries) work if the IQueryable is not our IQueryable?

  • If we throw it makes it a bit harder to mock
  • If we don’t throw it could look like these things work with other LINQ providers when they don’t
  • Decision: Throw and make sure that mocking can still be done. Document as necessary.

July 5, 2012

Refactoring and unit tests

We have been doing quite a lot of refactoring in the core code recently. This is generally good, but is risky due to the lack of test coverage for the core. To mitigate this risk we should:

  • Provide a good description of the refactoring as part of the code review
    • The description should include some information on how the refactoring was tested
    • The description should be included in the commit comment so that it can be understood later
    • More generally, we need to make sure we provide good descriptions for all code reviews and that these descriptions get committed
    • This is essentially the same process we had in the old team
  • Make sure that any core code you touch is tested
    • If possible, write tests before refactoring
    • If the refactoring is to permit testing then make sure the code is well tested after the refactoring

June 28, 2012

Code contracts

Background

Currently we are using code contracts just for runtime verification of preconditions and don’t build the reference assemblies.

Any users that derive their types from the types defined in EF that want to use code contracts won’t be able to access the contracts that we defined.

Options considered

  1. Ship the reference assembly in the same NuGet package as the main assembly
    • This would unnecessary bloat the package for the users that don’t use code contracts. The reference assembly is about half the size of the main one.
    • It is possible to use explicit assembly references in the package so the reference assembly wouldn’t be added as a project reference as it’s not used during runtime.
    • Currently we don’t specify postconditions, so the reference assembly would be of limited value even to code contracts users.
  2. Ship the reference assembly in a separate NuGet package
    • This wouldn’t bloat the main package, but would be less discoverable and still have all the other cons listed above
  3. Don’t ship the reference assembly, but build it
    • Users with access to the source code will be able to build it when needed.

Decision

We won’t ship the reference assembly unless we get some requests to do otherwise. However we need to improve the contracts at least on the public surface and consider enabling static verification.

June 21, 2012

Code-based configuration

Background

In EF 4.1 most configuration was done through code. In EF 4.3 we added the EntityFramework configuration section to allow configuration via config file.

Both code-based configuration and file-based configuration are useful. Code-based configuration can make use of IDE and compiler services (strong typing, Intellisense, etc.) and is flexible especially when coupled with dependency injection. File-based configuration can allow the same code to run in different environments without re-compiling.

The main problem with code-based configuration is making sure that the configuration is available to design-time tooling that does not run the application. The tooling must be able to find and execute (or otherwise interpret) the code. This is not possible if the configuration is performed by some arbitrary call made at some point during app startup.

Goals

  • If I don’t know about or care about using code-based configuration, then everything still works
    • In particular, EF 4.3 config file configuration is not changed
  • If I do want to use code-based configuration then it should be simple to do for the main developer scenarios
    • If I use it in this simple way, then tooling will also be able to find and use my code-based configuration
  • Whatever we add now should also form the basis for adding new configuration going forward
  • Less common scenarios (no context, multiple contexts, using Code First building blocks, etc.) should still be easy with code-based configuration
    • These scenarios may require additional steps but it should be easy to find out what the steps are

Basic Idea

We provide a DbConfiguration base class. To use code-based configuration a developer creates a class derived from DbConfiguration and places it in the same assembly as their context. Configuration settings are made in the constructor of this class.

public class MyConfiguration : DbConfiguration
{
    public MyConfiguration()
    {
        SetDefaultConnectionFactory(new LocalDbConnectionFactory("v11.0"));
        AddEntityFrameworkProvider("My.New.Provider", new MyProviderServices());
    }
}

Additional details

  • Can I specify the configuration type to use in the config file?
    • Yes. This overrides discovery and allows your DbConfiguration class to be contained in any assembly.
    • Design-time still works in this case because the config can (and should) be made available to the tooling
  • What if I run some code that needs to use configuration before I use my context type?
    • This just works if you’re not using code-based configuration or if you have specified your DbConfiguration class in the config file.
    • If you are using code-based configuration then you must set the DbConfiguration to use explicitly:
      DbConfiguration.Instance = new MyConfiguration();
    • If you don’t do this then we will throw when you use your context and we discover that you have a DbConfiguration class but didn’t set it.
    • We will also throw if you set the DbConfiguration to something we can’t discover.
  • What if I want to use EF without a derived context type at all?
    • This is the same as the previous bullet point except that we will never actually do the discovery
    • There is an assumption here that tooling will always make use of a derived context.
  • What if I have multiple contexts in multiple assemblies?
    • The easiest option is to specify the DbConfiguration class in the config file.
    • We may also choose to allow a special type of DbConfiguration class that just acts as a proxy to a class in another assembly.
  • What if I have a context that shouldn’t impact application configuration?
    • A good example of this is the HistoryContext used by Migrations. Using this context shouldn’t affect DbConfiguration resolution—it should just use whatever configuration the application is using.
    • If this context is in the same assembly as the DbConfiguration you want to use then it’s not a problem.
    • If it’s in a different assembly then you can put a type derived from DbNullConfiguration in this assembly. This tells EF to ignore DbConfiguration discovery for contexts in this assembly. In particular, it tells EF not to throw if a DbConfiguration is discovered in both this assembly and another assembly.
  • How does this work with dependency injection?
    • DbConfiguration is actually the place where the IDbDependencyResolver chain is rooted.
    • All configuration settings are resolved using the resolver chain.
    • When a configuration value is set this is implemented by adding a new resolver to the chain.
    • You can also add your own resolvers directly when constructing the configuration
  • Can I mutate the code-based configuration after it has been set?
    • Not directly because it encourages code that will be different when run in the application than when run at design-time.
    • The setter methods are protected to encourage usage from the constructor.
    • Once the configuration is set (either implicitly or explicitly) then it is locked and further attempts to modify will throw with info on the correct way to use DbConfiguration.
    • However, you can add a dependency resolver that can have behavior that changes as the application runs.
      • This doesn’t pose as much of a risk for design-time since the resolver is still added and must function in some way at design-time.
      • I’m currently using this in the functional tests to change the DefaultConnectionFactory to target SQL CE or LocalDb for some tests.
  • What if I set some config in both code and using the config file?
    • Config always wins.
    • The code below shows a CompositeResolver that ensures dependencies are always resolved from the config before other resolvers are tried.
  • What happens to the existing code-based configuration APIs?
    • We will obsolete (or remove) SetDefaultConnectionFactory
    • We may obsolete SetInitializer, but this will impact a lot of people.

Comments/suggestions from the meeting

  • Consider having NuGet package generate a DbConfiguration class instead of creating/updating config
    • This would avoid the potential confusion of some config we create overriding config the user then sets
    • But it is harder to parse/update DbConfiguration code—for example, to switch connection factory
    • Also not clear how other packages would update/add to this code
  • Consider allowing packages to create code snippets that are collected together
    • No clear idea on how this would work
  • We should update existing APIs to allow their dependencies to be injected
    • We could in the future then try to remove the non-context based use of the configuration, but this would be a lot of changes to existing APIs
    • Alternately we could throw if non-context use happens and the config is not specified in the config file
    • Either way, we should understand which APIs currently use configuration
    • We will implement this as is for now, then iterate on it

Current code for DbConfiguration

public class DbConfiguration
{
    private readonly CompositeResolver<resolverchain resolverchain ,> _resolvers
        = new CompositeResolver<resolverchain resolverchain ,>(new ResolverChain(), new ResolverChain());
    
    private bool _isLocked;

    protected internal DbConfiguration()
        : this(new AppConfigDependencyResolver(AppConfig.DefaultInstance), new RootDependencyResolver())
    {
    }

    internal DbConfiguration(IDbDependencyResolver appConfigResolver, IDbDependencyResolver rootResolver)
    {
        _resolvers.First.Add(appConfigResolver);
        _resolvers.Second.Add(rootResolver);
    }

    public static DbConfiguration Instance
    {
        get { return DbConfigurationManager.Instance.GetConfiguration(); }
        set
        {
            Contract.Requires(value != null);

            DbConfigurationManager.Instance.SetConfiguration(value);
        }
    }

    internal void Lock()
    {
        _isLocked = true;
    }

    internal void AddAppConfigResolver(IDbDependencyResolver resolver)
    {
        Contract.Requires(resolver != null);
        CheckNotLocked();

        _resolvers.First.Add(resolver);
    }

    protected void AddDependencyResolver(IDbDependencyResolver resolver)
    {
        Contract.Requires(resolver != null);
        CheckNotLocked();

        // New resolvers always run after the config resolvers so that config always wins over code
        _resolvers.Second.Add(resolver);
    }

    [CLSCompliant(false)]
    protected void AddEntityFrameworkProvider(string providerInvariantName, DbProviderServices provider)
    {
        CheckNotLocked();

        AddDependencyResolver(new SingletonDependencyResolver<dbproviderservices>(provider, providerInvariantName));
    }

    [CLSCompliant(false)]
    public DbProviderServices GetEntityFrameworkProvider(string providerInvariantName)
    {
        // TODO: use generic version of Get
        return (DbProviderServices)_resolvers.Get(typeof(DbProviderServices), providerInvariantName);
    }

    protected void SetDatabaseInitializer<tcontext>(IDatabaseInitializer<tcontext> strategy) where TContext : DbContext
    {
        CheckNotLocked();

        AddDependencyResolver(new SingletonDependencyResolver<idatabaseinitializer><tcontext>>(strategy));
    }

    public IDatabaseInitializer<tcontext> GetDatabaseInitializer<tcontext>() where TContext : DbContext
    {
        // TODO: Make sure that access to the database initializer now uses this method
        return (IDatabaseInitializer<tcontext>)_resolvers.Get(typeof(IDatabaseInitializer<tcontext>), null);
    }

    public void SetDefaultConnectionFactory(IDbConnectionFactory value)
    {
        CheckNotLocked();

        AddDependencyResolver(new SingletonDependencyResolver<idbconnectionfactory>(value));
    }

    public IDbConnectionFactory GetDefaultConnectionFactory()
    {
        return Database.DefaultConnectionFactoryChanged
paggma warning disable 612,618
                   ? Database.DefaultConnectionFactory
pragma warning restore 612,618
                   : (IDbConnectionFactory)_resolvers.Get(typeof(IDbConnectionFactory), null);
    }

    private void CheckNotLocked()
    {
        if (_isLocked)
        {
            throw new InvalidOperationException("Configuration can only be changed before the configuration is used. Try setting configuration in the constructor of your DbConfiguration class.");
        }
    }
}

 

June 14, 2012

Making database connections in the test suite more configurable

Currently the vast majority of the EF6 tests only run against SQL Express. Some others are setup to also run against SQL Compact or LocalDb. In order to check things are working as expected against other backends it would be good if we could configure most tests to run against any type of database. The initial requirements for this are:

  • Check-in tests and main CI build should still run against SQL Express to provide fast check-in bar and CI feedback.
    • The run against other backends must happen frequently (at least once a day) but could be either triggered or scheduled.
  • We will still need some tests directed towards a specific backend (e.g. SQL Compact) where those tests are specifically designed to test behavior that is different for different database types.
    • In other cases the tests may be the same, but with backend-specific assertions such as is currently implemented for the Migrations tests.
  • The default for any new tests should be to run against any backend.
    • Attributes should be used to exclude a test from certain backends.
    • It should also be possible for assertions to be backend specific, as stated above.
  • The correct provider manifest token should be used for the backend in question. For example, it would be to use the 2005 manifest token when testing against 2008, but this should not be done.

We will schedule this work item for EF6. Some ideas on implementation:

  • As much as possible only functional tests should hit the database. Some unit tests currently hit the database—these should be investigated and changed/moved appropriately.
  • The Migrations/XUnit infrastructure is a good place to start for this, but will need significant modification to meet the requirements.

June 7, 2012

Naming convention for static readonly and const fields

Decision:

  • All constants are pascal cased (e.g. PascalHasMoreHumps)
  • All fields are underscore camel cased (e.g. _humpLikeACamel)
  • Any public fields will be made into constants or encapsulated

How will we handle breaking changes in EF6?

When considering whether or not to make a breaking change in EF6 or later we will consider the overall customer experience, both long term and short term. In particular if the short-term impact of the change is small and the long term benefit is great, then we will take the change. A “small” short-term impact usually means one or more of the following:

  • The change will not affect people except in very corner cases
  • The change causes an immediate and easy-to-fix build break such that the chance of it causing a production bug are small
  • The change is really a bug fix and so fixing it will make more applications work correctly rather than break those that depend on the broken behavior

A breaking change that we are unlikely to take (without a flag to switch the new behavior on) would be one that breaks runtime behavior (rather than the build) in a subtle and/or difficult to fix way.

We are more able to take breaking changes in future versions of EF because we will not be releasing in-place updates and are making use of semantic versioning to signal significant breaking changes to consumers of our assemblies.

We will mark work items/bugs that result in changes breaking changes so that we can release breaking changes.

High-level ideas for using dependency injection with EF and specifics for the provider

Specific, current problem:

  • How can we get the EF provider from the config when the config has been overridden using DbContextInfo?
    • Many places that need the provider are not coupled to DbContext or DbContextInfo
    • Adding coupling to the context due to the dependency on the provider smells bad

More general problem:

  • We need a way to resolve dependencies (such as the EF provider) such that setting how the dependency is resolved is decoupled from uses of the dependency
  • In other words, we need an inversion-of-control (IOC) or dependency injection (DI) container

High level design:

  • We don’t want to be strongly coupled to any one DI container
    • Don’t want the binary dependency on the DI assembly
    • Don’t want to limit people to using a specific DI container when they may already be using and/or prefer another
  • We can follow the MVC model of having an IDependencyResolver interface into which other DI containers can be plugged
    • Learn lessons from MVC—for example, provider a way to release dependencies using the Release method
    • We need the ability to resolve by both CLR type and name—for example, the provider invariant name
    • it might look something like:
    public interface IDbDependencyResolver
    {
        object Get(Type type, string name);
        void Release(object service);
    }
  •  
    • Open questions following the design meeting:
      • Do we want additional overloads of Get that take just the type and/or just the name?
      • How does taking the type and the name work when plugging in various DI containers?
    • Note that we will provide generic extension methods to avoid the need to cast.
  • Provider an  app domain wide registration point for the IDependencyResolver instance to use
    • We will set a default resolver that is used if you don’t know/want/need to use your own container
    • Will use a Chain of Responsibility pattern to allow dependency resolution to be overridden per dependency
    • Open issue: we need to figure out how this effects design time scenarios and DbContextInfo
      • We should look at using attributed methods similar to those that ASP.NET use
      • We could use the equivalent of a configuration class like we have for migrations
      • Look at using ServiceLocator or the equivalent
    • API might be something like:
    public static class DbDependencyResolver
    {
        public static IDbDependencyResolver Root
        {
            get { ... }
        }

        public static void Add(IDbDependencyResolver resolver)
        {
            ...
        }
    }
  •  
    • We decided not to provide a setter for the Root. There is no real need to change the root as opposed to adding a new resolver to the chain and we can simplify the code that uses the chain if we always know that the last one in the chain will be our resolver—for example, we can make the assertion that some dependencies will always return a value and will not be null. 
    • Possibly we don’t need to expose Root but rather just expose methods—for now we will expose Root
    • Based on decision for DbContextInfo we will probably need a Remove method to remove a resolver from the chain.
  • Internally, we will change places that have hard-coded dependencies to allow their dependencies to be injected
    • Hard coded dependencies may be use of new or access to a Singleton
  • For public surface that implicitly uses a hard-coded dependency we will use the app-domain wide resolver
    • We will probably also provide public surface for the injected dependency
  • Depending on the scope of dependencies that part of the code needs we may choose to inject an IDbDependencyResolver or the contract for the specific dependency
    • Using IDbDependencyResolver allows multiple independent dependencies to be injected together and allows new dependencies to be added in the future without changing the API
    • Injecting the specific dependency is better where it specifically needed by the code in question

How this solves the specific problem:

  • DbContextInfo adds a new dependency resolver to the default chain. This updates the app-domain wide configuration to use dependencies from the specified config:
         DbDependencyResolver.Add(new DefaultDependencyResolver(_appConfig));
    
  • We also looked at not changing the app-domain but instead having it configured onto the context and then flowed through to everywhere we use it:
     var extendedResolver = new ResolverChain(DbDependencyResolver.Root);
     extendedResolver.Add(new DefaultDependencyResolver(_appConfig));
     context.Resolver = extendedResolver;
  • The former has the advantages of simplicity for most of the stack and consistency for all the code no matter where it gets the root resolver from
  • The latter has the advantage that the DbContextInfo only sets the new resolver for the scope that it is in use for without changing the whole app-domain. However, uses of DbContextInfo are currently very limited and most based around app-domain modification anyway. We will provide a way for the DbContextInfo to remove its modification to the app-domain so that the changes can be scoped if needed.

May 31, 2012

Making EF code more testable

Currently many EF classes are sealed and/or have non-virtual methods. This makes it hard to create mocks for these classes. Options:

  • Use new mocking capabilities in .NET 4.5/Dev11
    • Pros:
      • No need to change existing classes just for testing
      • Well-defined public inheritance is retained as recommended by Framework guidelines
    • Cons:
      • Will tie us to .NET 4.5 which will be problematic for testing against .NET 4.
      • Doesn't act as a forcing function for generally improving design--e.g. introducing seams
  • Create wrapper classes so we can mock internal types
    • Pros:
      • No need to change existing classes just for testing
      • Well-defined public inheritance is retained as recommended by Framework guidelines
    • Cons:
      • Introduces an additional layer of indirection which is potentially not needed
      • Doesn't allow the more mockable types to be used by customers
  • Unseal classes and add virtual methods
    • Pros:
      • Allows customers to mock EF types more effectively as well as making it easier for us
      • Doesn't change the amount of code/indirection we have or tie us to .NET 4
      • Can still add internal classes for factoring where we want to change class responsibilities/design without breaking existing public surface
      • Public inheritance can be used in places we didn't anticipate it
    • Cons:
      • Goes against Framework guidelines in that public inheritance can now be used in places where the code doesn't anticipate it resulting in strange behavior in EF

Open questions:

  • Should we remove wrapper classes that have already been added?
    • Yes
  • Should we go through and make everything virtual in one go?
    • The problem with just making everything virtual and constructible is that it doesn't help introduce seams into the code that would allow appropriate dependency injection for people to use to substitute their own implementations or mocks
    • In the future we need to address this and move to a more open architecture
    • For now we will just use internal constructors and make methods virtual on a class-by-class basis as needed
    • We will look at introducing abstract base classes or interfaces for publicly interesting classes
    • If we need to jump through hoops to mock something then we look at introducing seams and using dependency injection and consider making it public
  • What do we do for mocking of classes that we don't own?
    • We should not use conditional compilation here
    • We can wrap external classes in proxies and use refactoring to ensure that the proxies are used internally while the public surface doesn't change.


 

Code duplication in Async

It is not possible to use the same code for async and sync versions using normal mechanisms of re-use because of the way the compiler re-writes the code. This leads to some code duplication.

Possible solutions:

  • We could use T4 templates or conditional compilation. We decided not to do this because it adds complexity/overhead to the build which doesn't seem like a good tradeoff for the amount of duplicate code that is removed. Also, right now the async and sync methods are quite similar, but they will likely diverge as we implement more features at which point the value of the T4/conditionals decreases.
  • We could call the async method from the sync version and block. We won't do this because it will very likely have a large perf implication.
  • We will factor out non-async parts of the methods where we can and as appropriate and add documentation to the code to make sure people know that there are two versions of the method to be changed.

Supporting MVC scaffolding

Problem

The MVC scaffolding code makes use of types defined in the .NET Framework. This means that it will not work in EF6 when these types are pulled out of into EF.dll.

Details

The MVC scaffolding code uses Dynamic Data. However, it doesn’t use the EF model provider built in to Dynamic Data but rather passes in a new EF model provider contained in the MVC assembly:

    MetaModel metaModel = new MetaModel(false);

    metaModel.RegisterContext(
        (DataModelProvider) new EntityFrameworkDataModelProvider(this._contextType, modelAssemblies));

Given this information it seems to me we have a few options:

  • We could add a method somewhere in EntityFramework.dll that would return a Dynamic Data model provider. We could essentially copy the code for this from MVC for EF5 to reduce the risk associated with adding this code. MVC could then call this method anytime it has a DbContext and this will continue to work when we update to EF6.
    • Pros: Makes MVC4 work with EF6; relatively low risk
    • Cons: Dependency on dynamic data; adds public surface that we probably don’t want
  • The Dynamic Data model provider in MVC could be updated to work entirely by Reflection against .NET or EF6 types
    • Pros: Makes MVC4 work with EF6; doesn’t introduce new dependencies into EF5
    • Cons: From looking at the code in MVC4 this seems like quite a lot of work and it would be easy to get it wrong, especially since we are still in the early days of EF6
  • We could implement a full MVC scaffolder in EntityFramework.dll or a separate assembly in our NuGet package
    • Pros: Makes MVC4 work with EF6; gives us control over the EF scaffolding so we can improve it when needed
    • Cons: Very high risk for EF5; EF takes dependency on MVC which seems wrong
  • We could do nothing now and then release an MVC scaffolder for EF6 with EF6 or as part of the MVC tooling update
    • Pros: Lowest risk for Dev11; gives us time to get the scaffolder right for EF6
    • Cons: MVC4 won’t work with EF6 without pulling in a new scaffolder; I’m not sure how easy it is to integrate a new scaffolder into MVC   

Decision:

  • We will update the EF scaffolder as part of the tooling update which will happen before EF6 goes RTM
  • We can release an EF6 scaffolder for people using pre-release versions of EF6; this will no longer be needed once the tooling updates

Last edited Nov 9, 2012 at 4:58 PM by BriceLambson, version 31

Comments

ajcvickers Jul 15 at 6:32 PM 
@jancg Design meeting notes for EF7 are on GitHub here: https://github.com/aspnet/EntityFramework/wiki/Entity-Framework-Design-Meeting-Notes

jancg Jul 15 at 12:59 PM 
Was the last meeting really 7 months ago?

pravinady Apr 16, 2013 at 4:05 PM 
Any idea on when can be the final release of EF6?

Thanks,
Ady.