15

Closed

Native support for Enumerable.Contains

description

Currently Enumerable.Contains like this:

new [1,2,3].Contains(a)

Is translated to an sub-tree of expressions equivalent to

1 == a || 2 == a || 3 == a

This expression tree is kept balanced to avoid stack overflows in our visitor code and finally translated to an IN expression in SQL-gen:

a IN (1,2,3)

We do a lot of in-memory processing to do both translations and the numbers of elements in the list has been reported to have exponential impact on the performance of query translation:

http://stackoverflow.com/questions/7897630/why-does-the-contains-operator-degrade-entity-frameworks-performance-so-drama/7936350#7936350

We can avoid the whole issue if we add support for a DbExpression that can hold a collection parameter directly and also have IN/Contains semantics. The addition of this expression would imply a change to the provider model, but the benefits would be huge for this kind of query when there are many elements.

Alternatively we could profile the processing that we do in this case and see if there is any low hanging optimization we can still do in the code.
Closed Jan 7, 2013 at 4:26 PM by BriceLambson

comments

Dave69B wrote Oct 5, 2012 at 11:13 AM

I've just encountered this. My users have access to depots. In views screens they can filter by any selected subset of these. On a clients site where a user has access to 100 it takes 4 seconds to build the query and 0.1 to execute it. e.g
var query = from fb in FITTERBOOKINGS
where fb.BOOKINGDATE == theDate && fb.DRIVER != string.Empty
&& selectedDepotList.Contains(fb.DRIVERS.DEPOT)
select fb;

My screen full refresh time exploded from 1 second to 12 as there are 4 queries that must run. I must admit I did not check for support of a SQL IN when comparing EF against other options.

Please, please, please add support for array.Contains as a SQL in before the boss forces me to take EF out.

fujiy wrote Dec 6, 2012 at 5:47 PM

Fernando77 wrote Jul 26, 2013 at 1:01 AM

Hi, I am working in the implementation of DbInExpression support in MySql Connector/NET
the problem is that the call to Visit(DbInExpression) is never made in the following query sample:
    List<int> Ages = new List<int>();
    Ages.AddRange(new int[] { 37, 38, 39, 40, 41, 42, 43 });
    var q = from e in context.Employees
            where Ages.Contains(e.Age.Value)
            orderby e.LastName, e.FirstName
            select e;
    string[,] data1 = new string[,] { { "Flintstone", "Fred" }, { "Flintstone", "Wilma" },
      { "Rubble", "Barney" } };
    string query;
    query = q.ToTraceString();
    foreach (var e in q)
    {
      ...
    }
So, my question is, for other database providers is there supposed to be something else beyond implementing Visit( DbInExpression ) in our implementation of DbExpressionVisitor?

Thanks.

ajcvickers wrote Jul 26, 2013 at 1:42 AM

@Fernado77 You will need to return true from your implementation of DbProviderManifest.SupportsInExpression.

Thanks,
Arthur

Fernando77 wrote Jul 26, 2013 at 2:42 AM

Thanks Arthur, that was the missing part.

ajcvickers wrote Jul 26, 2013 at 2:47 AM

@Fernando77 In addition to my comment above I should say that it's great to hear that you are working on a provider. More details about providers for EF6 can be found here: https://entityframework.codeplex.com/wikipage?title=Rebuilding%20EF%20providers%20for%20EF6

Once you have something that people can use we would love to add a link to the list of EF6 providers here: http://msdn.microsoft.com/en-us/data/jj730568

If you have any more questions we will be happy to help. Feel free to start a discussion (https://entityframework.codeplex.com/discussions) if anything isn't clear.

Thanks,
Arthur

Fernando77 wrote Jul 26, 2013 at 4:55 PM

Thanks Arthur & EF team,

As a feedback on this feature, I just implemented it, but found it seems to miss some cases, ie

This LINQ query is correctly optimized to DbInExpression:

Ages.AddRange(new int[] { 37, 38, 39, 40, 41, 42, 43 });
    var q = from e in context.Employees
            where Ages.Contains(e.Age.Value)
            orderby e.LastName, e.FirstName
            select e;
And this too:
Ages.AddRange(new int[] { 37, 38, 39 });
    List<int> Ages2 = new List<int>();
    Ages2.AddRange(new int[] { 40, 41, 42, 43 });
    q = from e in context.Employees
        where (Ages2.Contains(e.Age.Value) && (e.FirstName == "Flintstones")) ||
        (!Ages.Contains(e.Age.Value))
        orderby e.LastName, e.FirstName
        select e;
But this other LINQ query is not:
q = from e in context.Employees
        where (e.Age.Value == 37) || (e.Age.Value == 38) || (e.Age.Value == 39) ||
        (e.Age.Value == 40) || (e.Age.Value == 41) || (e.Age.Value == 42) ||
        (e.Age.Value == 43)
        orderby e.LastName, e.FirstName
        select e;
(we were previously implementing this optimization at the Visit( DbComparisonExpression ) level, so our previous implementation still optimizes both cases).

divega wrote Jul 26, 2013 at 6:01 PM

Hi Fernando,

As you probably know, before we added DbInExpression we used to expand Enumerable.Contains into a tree that looked the same as the tree of the last query. We later performed the optimization of the tree into a IN(..) clause in SQL generation. DbInExpression was introduced specifically to avoid the high costs of tree manipulation in memory we were seeing for Enumerable.Contains with collections with many elements. We do not do anything in the query pipeline to transforms chains of OR-ed equality comparisons into a DbInExpression. As long as this transformation produces nicer SQL or potentially better performing SQL for MySQL my recommendation would be to keep the previous optimization even after you add support for DbInExpression.

Hope this helps.