Bulk (i.e. set-based) operations

Topics: EF Runtime
Jul 19, 2012 at 5:15 PM
Edited Jul 19, 2012 at 5:17 PM

Hi folks,

First off, great news on the open sourcing of Entity Framework! Congratulations to the team for making this monumental step.

Now, thought I'd be first to ask about a feature for the EF 6 roadmap. It would be great to see support for batching updates and deletes in EF using a similar syntax to https://github.com/loresoft/EntityFramework.Extended.

That same library has code for future query support.

It would be awesome if EF could integrate this, although that library only currently supports SQL Server - queries fail under SQLCE and presumably other providers.

Finally, it would be nice to have something similar to NHibernate's flexibility in lazy loading one-to-many relationships - see Ayende's blog entry on <set/> configuration. Controlling how a one-to-many relationship is retrieved can have a big impact on performance - sometimes a JOIN is not always the best route...

Cheers!

Dean

Jul 19, 2012 at 8:48 PM

I would also be really keen to have this batch style update/delete support added to the framework.

However I think performance is the leading driver for adding this type of query. I have done some tests on the EF Extended and found that the performance was quite poor. I think this would be an amazing (and much requested) feature to add in so long as we can ensure it performs faster than a regular EF delete query.

I would be really keen to contribute to this code.

Jul 21, 2012 at 11:51 PM
Edited Jul 21, 2012 at 11:53 PM

I agree that bulk modification/deletion feature is useful but I would be very careful about its implementation. It goes directly against the way how EF saves changes. When you make changes by EF you must "commit" them by calling SaveChanges but these bulk updates are executed immediately and unless you use outer TransactionScope, they are also executed in different transaction (and you must also manually control lifetime of DB connection to avoid promotion to distributed transaction).

I'm up for bulk updates and deletes from Linq queries but let's make explicit that they are just immediately executed database commands which run outside of current unit of work = they are just more sophisticated version of ExecuteStoreCommand. That means these methods don't belong to context or set classes. They belong to some database helper (for example Database class in DbContext API).

Including bulk updates to unit of work (= deferring their execution to SaveChanges) IMHO requires complicating the way how changes are tracked and I'm not sure if it really makes sense for EF. There are multiple complications which must be solved:

  • Bulk modifications / deletes should be registered as command for deferred execution in SaveChanges
  • Bulk modifications should correctly work for complex mapping scenarios where entity is mapped to multiple tables (TPT inheritance, table splitting) - this must be done for immediate execution as well
  • Bulk modifications should be "parsed"
  • "Parsed" bulk modification should be applied on currently tracked entities - if you loaded entity and this entity is affected by bulk modification I think the modification should be applied on the loaded entity. 
  • "Parsed" bulk modification should be applied on entities loaded after registering the bulk operation but before really executing the modification (SaveChanges)

I think last two points involve too much magic inside the common way how EF tracks changes. 

Coordinator
Jul 22, 2012 at 3:13 AM
Edited Jul 22, 2012 at 3:13 AM

Hello everyone,

Just wanted to give the heads up that I have updated the title of the thread to refer to bulk operations (or as we often refer to them in the EF team, set-based operations) instead of batching and futures, which are more about reducing the number of database round-trips necessary to send multiple discrete CUD operations (batching) or multiple SELECT queries (futures). There is another thread about batching going on at http://entityframework.codeplex.com/discussions/377636.

Hello Ladislav,

I think this is a good analysis. I personally also see bulk operations as something that would almost necessarily be separate from the current unit of work for the same reasons (I am missing the bit about modifications being "parsed" though, could you please elaborate?). That said, it is very interesting to me to think about the concept of units of work that could be processed entirely on the server without bringing objects into memory, but I believe this is an entirely separate feature.

Thanks,
Diego

Jul 22, 2012 at 10:53 AM

Hi Diego,

parsing wasn't the best term to describe the idea. Let me explain it on simple example:

  • EF is updated to support deferred execution of bulk operations as part of unit of work
  • You have an entity which contains Owner property. 
  • You load few instances of the entity into your current unit of work
  • You decide to registering some deferred bulk modification in the same unit of work. This bulk modification will change Owner property for multiple records
  • What about entities which are already loaded - should EF apply modification of Owner property immediately on those entities which match modification condition? I think it should because this modification is part of the unit of work and it should be visible in your application for the rest of the code running in the same unit of work but to do that EF must understand the bulk command which will be later executed in the database - that is what I called parsing. I also think that the same approach should be used for entities materialized after registering deferred bulk modification (and prior to saving changes).

Ladislav

Jul 23, 2012 at 12:31 AM

Hey Ladislav

I agree that special thought needs to go into where such bulk queries are executed in-line with the the current SaveChanges stream. As you have rightly stated its a pretty complicated problem. From what has been discussed (and my own thoughts) there appear to be several approaches to where these queries could fit.

  1. Immediate execution in-line with ExecuteStoreQuery
    • Pros
      • Fairly simple implementation
    • Cons
      • What happens to context changes to the entities modified by regular EF actions affecting the same data?
  2. Deferred execution until save changes ignore manual application to tracked entities
    • Pros
      • Fits with EF SaveChanges model
    • Cons
      • When in SaveChanges do you execute the bulk query, does it mean you need a time based split on tracked changes which sounds complicated and not in-line with current EF?
      • User doesn't get a 'true' representation in the local dataset of the changes they have made
  3. Deferred execution until save changes, manually apply change to tracked entites
    • Pros
      • Fits with EF SaveChanges model
      • User gets changes applied to local state
    • Cons
      • Bulk operations take longer to apply as they have to enumerate local collections
      • You may need to do multiple updates to the same entity (IE one for each applicable bulk operation + one for the tracked changes update) this is un-ideal from a performance standpoint
      • Reasonably complex to implement
  4. Deferred execution until save changes, Exclude tracked entities from query and manually apply change to tracked entites
    • Pros
      • Fits with EF SaveChanges model
      • User gets changes applied to local state
      • Doesn't double up updates
    • Cons
      • Very complex
      • Bulk operations take longer to apply as they have to enumerate local collection

My feeling is that as Ladislav has suggested 3 is a pretty good option.

Do these sum up the ideas so far?

Does anyone have any other options to explore?

Coordinator
Jul 23, 2012 at 5:22 PM

Hey Guys,

It's great to see this discussion going on. The feature request that matches this discussion is http://entityframework.codeplex.com/workitem/52.

If anyone is planning to tackle the code for this feature I just want to encourage you to reach out to our team for help working out how to implement it. This would be a pretty complex feature and for us to accept it back into the main code base it would need to work with the provider model etc. We're certainly on board with helping you get familiar with those parts of the code base.

~Rowan

Jul 30, 2012 at 3:27 PM

There is an open source project that implements something along those lines:

http://efe.codeplex.com/

Samples:

this.Container.Devices.Update(o => new Device() { LastOrderRequest = DateTime.Now, Description = o.Description + "teste" }, o => o.Id == 1);

this.Container.Devices.Delete(o => o.Id == 1);

 

Aug 15, 2012 at 5:22 PM
lukemcgregor wrote:

I have done some tests on the EF Extended and found that the performance was quite poor. 

What was the scenario you found to have poor performance? As the author of EntityFramework.Extended, I'm interested in what I can do to improve the performance and usability. 

~ Paul

Aug 15, 2012 at 5:51 PM
RoMiller wrote:

If anyone is planning to tackle the code for this feature I just want to encourage you to reach out to our team for help working out how to implement it. This would be a pretty complex feature and for us to accept it back into the main code base it would need to work with the provider model etc. We're certainly on board with helping you get familiar with those parts of the code base.

I'm interested in possibly looking into implementing this in the EF code base.  When creating EntityFramework.Extended, I had to wrap the update or delete in the provider generated select with a join.  It would obviously be better if the provider itself could generate and execute.  

Having provider support would be step one of the implementation.  I assume the best way to pass to the provider would be some variation of DbModificationCommandTree? I'm not sure if DbDeleteCommandTree or DbUpdateCommandTree would work as they might be expecting a single result. What about backward compatibility with providers?  Is there a way to know what the provider supports?

Integration with change tracking or local store would be nice but quite complex.  I'm thinking that could be a phase 2 implementation.  

Delayed execution till SaveChanges would also be nice, but again, phase 2. 

What is the favored API design?  I've seen several variations of APIs that do the same thing.  The design I implemented in EntityFramework.Extended uses an extension method of DbSet or ObjectSet.   The syntax is as follows ...

context.DbSet.Delete(whereExpression);
context.DbSet.Update(whereExpression,  updateExpression);

Variation could be ...

context.DbSet.Where(expression).Delete();
context.DbSet.Where(expression).Update(updateExpression);

Or ...

context.DbSet.Where(expression).DeleteAll();
context.DbSet.Where(expression).UpdateAll(updateExpression);

 

thanks,

~ Paul

Developer
Aug 17, 2012 at 12:33 AM

Hi Paul,

Thanks for the comments. We discussed this in the EF design meeting a couple of weeks ago and made some decisions. You can check out the notes here: http://entityframework.codeplex.com/wikipage?title=Design%20Meeting%20Notes.

We don't currently have anybody working directly in this area because they are heads-down finishing async work right now, but we hope to come back to it in a few weeks and provide some more guidance. If you have additional questions after reading the notes we will try to address them as soon as possible, and also if you feel you have enough info to start prototyping something then that would be great.

Thanks,
Arthur