49
Vote

[UpForGrabs] [Performance] Reduce start up time by loading finished Code First models from a persistent cache

description

Building and compiling large models using the Code First pipeline can be expensive in terms of start up time. Several pieces are coming together that could allow us to serialize the output of Code First (including the O-C mapping) into an XML artifact and then to deserialize it directly on subsequent runs.

Creating an efficient and completely reliable way of verifying that the serialized version of the model actually matches the Code First model doesn't seem feasible, but there are simple heuristics, e.g. checking for file timestamps of the assembly containing the model and the timestamp of the XML artifact that should work reasonably well in common scenarios.

Initial tests with a hacky prototype show that start up time of a model with 100 entities can do down from 8 seconds to 2 seconds using this approach.

file attachments

comments

rothdave wrote Nov 28, 2013 at 8:25 AM

I like this idea. In fact, before we migrated to EF, we used a similar approach in Nhibernate. On startup, we checked if the timestamp of the dll containing the models is bigger than the timestamp of the xml file. If so, we re-generated the xml. Otherwise we loaded the cached one.

Could you please share your prototype with us, so that I can try it out? (Doesn't matter if its hacky, I am just interested in the bits ...)

RoMiller wrote Dec 12, 2013 at 9:30 PM

EF Team Triage: Moving issues with Impact set to Low out of the 6.1.0 release as we only have time to address High and Medium issues in this release. We will re-triage these issues for future releases.

This does not exclude someone outside of the Microsoft EF team from contributing the change/fix in 6.1.0.

rothdave wrote Dec 16, 2013 at 9:12 AM

@RoMiller: How is the "impact type" defined in EF?
This feature would massively reduce startup-time (8 vs 2 seconds) for the model, so in my opinion this as a high-impact task.

However, as already asked in my previous post, I would be happy if you could share your initial prototype with the community. This would help contributors to understand how this feature could be implemented.

thanks.

ChrisMaeder wrote Dec 30, 2013 at 10:40 AM

This is impacting us fairly severely with our database implementation, so I feel that this should be categorized as a high-impact issue.
Thanks you

RoMiller wrote Dec 31, 2013 at 9:46 PM

Impact is a combination of a few things, but most notably how severe the issue is and what percentage of our customers it impacts. I'm clearing the Impact field on this one though as I agree it's probably not low.

rothdave wrote Jan 9, 2014 at 3:17 PM

Could you please share the branch with the prototype of this feature?

Duality wrote Feb 12, 2014 at 10:11 AM

Randomly, I made a repo for my EF Code First, my approach to building the dynamic contexts seems to save 0.5s over a normal code first context in my performance benchmarks... Until I realised that most of the difference was caused because my dynamic context contained:
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        Database.SetInitializer<MyContext>(null);
    }
I wonder how many production implementations could start quicker if the initializer didn't run, maybe default it to null / none?
Anyway, sorry to wander off the track.

rothdave wrote Mar 18, 2014 at 10:05 AM

Now that EF 6.1 RTM is out, will this issue be considered for the next release?
Startup time is an important aspect in our application, and the numbers you posted here look very promising.

If not, could you please share the prototype with the community?

emilcicos wrote Mar 18, 2014 at 4:39 PM

The initial prototype, while proof of concept from the perf. point of view, was very rough because it was missing a lot of "plumbing". Not long ago I started improving it a bit, but I was side-tracked with other tasks. I will try to wrap-up what I have this weekend and share it.

rothdave wrote Mar 18, 2014 at 5:35 PM

Thanks for your feedback emilcicos! Great news that you are working on this subject!

emilcicos wrote Mar 25, 2014 at 3:28 PM

I attached a .zip containing a .patch file with the prototype and and a .cs file with sample code.
I only did some basic testing and it has not gone yet through our official review process, but if you find it useful I can iterate on it as needed.

rothdave wrote Mar 25, 2014 at 5:06 PM

Thank you so much emilcicos!! That is fantastic! I just applied your patch to current master and the performance boost is super awesome!

These are my results without an Ngend EF (not possible because i cannot strongly sign my custom build with your patch):

First-Query-Time:
  • Without DbModelStore: 4.5 seconds
  • With cached DBModelStore: 1.8 seconds
So i guess If I could Ngen this build, the startup time would be about 0.6 seconds (Ngen`d EF saves about ~ 1.2 seconds on my machine) which is very nice!

So I really hope your work will be in the EF-Alpha channels soon, so I can use it in combination with Ngen :)
Thanks :)

rothdave wrote Mar 27, 2014 at 12:14 PM

I have a question regarding the heuristics if the generated xml is valid. At the moment your solution does not check If the model has changed. This will lead to runtime errors.

One simple solution would be to check against the timestamp of the model assembly.
However, in the case of pregenerated views, EF can automatically determine the differences via hashing and transparently regenerates the view files if necessary.
Is something similar possible or would this eliminate the performance gains?

rothdave wrote Mar 31, 2014 at 7:24 AM

In case anyone is also using the cached db model store: here is my implementation of DefaultDbModelStore which invalidates/removes the cached xml file, if the last write time is prior to the last write time of the corresponding domain assembly:

https://gist.github.com/davidroth/9886349

Maybe this will be useful for some people.

rothdave wrote Apr 16, 2014 at 7:21 AM

@EfTeam/Emilcios: Could you please let me know if this feature will get into master branch any time soon?
This would be very nice, because then I could use NGEN again which is currently not possible with my self-build assembly. I also lost the advantage of using the ef alpha channels.

Would be happy if you could write me a quick reply so that I know if its worth to roll out a real self-signed assembly by myself.

thanks.

ajcvickers wrote Apr 16, 2014 at 5:12 PM

@rothdave This work item is currently in the "Future" release which means that we don't plan to include it in the next release of EF.

robepstein wrote Oct 7, 2014 at 7:10 PM

Those of us with large models and unacceptable application startup times consider this of critical importance worthy of immediate inclusion. Given that there is a patch already that integrates the needed functionality what is the harm in integrating the patch and starting testing with the feature?

tidyui wrote Oct 16, 2014 at 11:06 PM

I usually don't even have large entity models in the projects I work with (about 20 entities) and the startup time is really horrid.

Consider building a large-scale web application using EF. With this type of startup penalties, the entire performance of the site is dependent on caching, either by third party CDN services or by just caching rendered views to disc to serve while the application warms up.

In my opinion, every customer requesting the startpage of the site, could be a customer potentially accessing the application after it's been disposed and has to be restarted.

With this said, I really hope the current patch is integrated soon, and keep up the good work!

divega wrote Jan 13 at 6:05 PM

Linking to https://entityframework.codeplex.com/workitem/2631, which seems to be a dupe with some additional information and votes.

BrandonDahler wrote Jan 28 at 4:40 AM

Instead of saving to an xml file and relying on the last modified date and other metadata, I propose we should use a resource file that automatically invalidates itself depending on whether the dll has been recompiled or not.

I wrote a basic proof of concept to allow us to read/write resources out to a XYZ.{exe,dll}.entitycache file and automatically invalidate when the executable is modified. No special assembly references are required.
using System;
using System.Collections;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Resources;

namespace Test
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            var assembly = Assembly.GetExecutingAssembly();
            var currentModuleVersionId = assembly.ManifestModule.ModuleVersionId;
            Console.WriteLine(currentModuleVersionId);
            Console.WriteLine();


            var requiresRebuild = true;
            var resourcePath = assembly.Location + ".entitycache";

            if (File.Exists(resourcePath))
            {
                using (var resourceReader = new ResourceReader(resourcePath))
                {
                    var resourceDictionary = resourceReader.OfType<DictionaryEntry>()
                        .ToDictionary(de => de.Key, de => de.Value);


                    if (resourceDictionary.ContainsKey("_.Assembly.ModuleVersionId"))
                    {
                        var storedModuleVersionId = (Guid) resourceDictionary["_.Assembly.ModuleVersionId"];
                        requiresRebuild = (storedModuleVersionId != currentModuleVersionId);
                    }
                }
            }

            if (requiresRebuild)
            {
                Console.WriteLine("Rebuilding resource file.");
                using (var resourceWriter = new ResourceWriter(resourcePath))
                {
                    resourceWriter.AddResource(
                        "_.Assembly.ModuleVersionId",
                        currentModuleVersionId);
                }
            } else {
                Console.WriteLine("Rebuild not required.");
            }
        }
    }
}
I personally plan to work on this issue next, but no particular promises on timeline as I do my contributions in my free time.

Let me know of any concerns/thoughts on taking the XYZ.{exe,dll}.entitycache approach.