1
Vote

Nullable comparison query generation

description

Hi guys,
I have wrote sample code to test sql generation
                DateTime? dateTime = DateTime.Now;
                ctx.Transactions.Where(t => t.ShippedDate == dateTime).Load();
                dateTime = null;
                ctx.Transactions.Where(t => t.ShippedDate == dateTime).Load();
Two quries generates same SQL:
SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[ProductId] AS [ProductId],
    [Extent1].[OrderLineId] AS [OrderLineId],
    [Extent1].[TransactionDate] AS [TransactionDate],
    [Extent1].[ShippedDate] AS [ShippedDate]
    FROM [dbo].[Transactions] AS [Extent1]
    WHERE ([Extent1].[ShippedDate] = @p__linq__0) OR (([Extent1].[ShippedDate] IS NULL) AND @p__linq__0 IS NULL))
This query will suffer Parameter Sniffing Problem.

Best way is generate different quries:
-- FOR NULL VALUE
SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[ProductId] AS [ProductId],
    [Extent1].[OrderLineId] AS [OrderLineId],
    [Extent1].[TransactionDate] AS [TransactionDate],
    [Extent1].[ShippedDate] AS [ShippedDate]
    FROM [dbo].[Transactions] AS [Extent1]
    WHERE [Extent1].[ShippedDate] IS NULL

-- FOR NON NULL VALUE
SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[ProductId] AS [ProductId],
    [Extent1].[OrderLineId] AS [OrderLineId],
    [Extent1].[TransactionDate] AS [TransactionDate],
    [Extent1].[ShippedDate] AS [ShippedDate]
    FROM [dbo].[Transactions] AS [Extent1]
    WHERE [Extent1].[ShippedDate] = @p__linq__0

comments

RoMiller wrote Jul 3 at 6:34 PM

EF Team Triage: We agree that this would be better SQL to generate. Implementing it on the EF6.x code base isn't really practical because the shape of LINQ query to shape of SQL query is essentially a 1:1 mapping and we don't have the concept of changing query shape based on parameter values.

We do think this would be much more realistic on the EF7 code base we are working on and so we're putting this item in the Future release so that we can look at it for EF7.

divega wrote Jul 4 at 10:54 AM

Some additional notes for future consideration:

The suggestion can be interpreted as changing EF to perform its own parameter sniffing and to produce alternative target queries depending on the parameter values found, in particular for each nullable parameter we would produce two different version of the query, one for null and another one for non-null values.

This potentially means tracking in the cache 2 ^ n simpler alternative target queries for the same source query with n parameters (the target queries could of course be added lazily to the cache as different parameter values are observed), supposedly giving the database server an opportunity to pick a more efficient plan for each of them as opposed to having to pick a plan based on a more complex query and running the risk of picking a plan on first execution that isn't good for further executions due to having parameter sniffing.

This might work by helping produce simpler queries, and even helping reduce the negative impact of parameter sniffing for a subset of cases, i.e. for those cases in which null vs. non-null values can determine different plans, but we would really need to consider the benefits vs. the drawbacks very carefully, e.g.:
  1. It increases the number of queries to be maintained in the query cache, both at the EF level and at the database server level.
  2. Currently we are taking into account the nullability of both sides of a comparison and whether the comparison expression is negated (with NOT) to decide the minimal set of terms to expand the comparison into. We would need to understand how the suggested approach works in all the cases, in particular for the negated case.
  3. The approach does not help for those cases in which the distribution of non-null values isn't even and therefore the optimal plan can be different for different non-null values. In fact, from the parameter sniffing perspective this seems to be just a SQL Server specific optimization. E.g. of other major database servers that perform parameter sniffing, Oracle (which calls the feature "bind variable peeking" :)) shouldn't need this because from Oracle11g on they have implemented "adaptive cursor sharing", which detects if the query plan is sensitive to parameter values and will create and cache alternate plans for the same query and those different values.
My take is that while the null vs. non-null heuristic might have benefits (we would need to measure it) EF cannot compensate for all the limitations of SQL Server's parameter sniffing by implementing its own limited form of parameter sniffing: SQL Server has its histograms and cardinality estimators, but EF will never have enough knowledge. As long as SQL Server isn't improved to maintain multiple plans for the same query and different values we may be better off giving our users the ability to disable parameter sniffing, e.g. by generating OPTION(OPTIMIZE FOR UNKNOWN) or by telling them to set trace flag #4136.

gandjustas wrote Jul 4 at 6:15 PM

OPTION(OPTIMIZE FOR UNKNOWN) is not a solution where a lots of records in table have null values for the field, and a small part have non-null values. In this case first call with null parameter will produce plan with Scan, and it will be cached for all executions. Same for OPTION(OPTIMIZE FOR UNKNOWN). This plan will be worst plan for non-null value.

There is no way in SQL Server to have different plans for same query. But in application you can produce different quires for different values, but this makes application more database engine\version specific. It's not good for universal data access library.

I think database engine-specific query optimizations should be built into providers for particular DBMS.