1

Closed

EF6 String Concatenation SQL Does Not Honor UseDbNullSemantics

description

There appears to be a problem in the SQL generation logic for Entity Framework 6, particularly related to its null string concatenation behavior. This behavior appears to have changed in EF6 due to the changes implemented in work item 2023.

In EF5, the following LINQ query:
var query = from r in ctx.base_tbl
                        join lkp in ctx.lkp_tbl on r.lkp_name_cd equals lkp.lkp_name_cd into lkpOut
                        from lkp in lkpOut.DefaultIfEmpty()
                        select new { r.name, string_concat_field = lkp.lkp_name_cd + " - " + lkp.dspl_desc };
would have produced SQL similar to:
SELECT 
    1 AS [C1], 
    [Extent1].[name] AS [name], 
    [Extent2].[lkp_name_cd] END + N' - ' + [Extent2].[dspl_desc]
    FROM  [dbo].[base_tbl] AS [Extent1] AS [C2]
    LEFT OUTER JOIN [dbo].[lkp_tbl] AS [Extent2] ON [Extent1].[lkp_name_cd] = [Extent2].[lkp_name_cd]
In EF6, the following SQL is produced:
SELECT 
    1 AS [C1], 
    [Extent1].[name] AS [name], 
    CASE WHEN ([Extent2].[lkp_name_cd] IS NULL) THEN N'' ELSE [Extent2].[lkp_name_cd] END + N' - ' + CASE WHEN ([Extent2].[dspl_desc] IS NULL) THEN N'' ELSE [Extent2].[dspl_desc] END AS [C2]
    FROM  [dbo].[base_tbl] AS [Extent1]
    LEFT OUTER JOIN [dbo].[lkp_tbl] AS [Extent2] ON [Extent1].[lkp_name_cd] = [Extent2].[lkp_name_cd]
It was my understanding that if the DbContext.DbContextConfiguration.UseDbNullSemantics flag is set, then the simpler SQL query should be produced.

I have a sample project that I can provide upon request, but it is too large to upload.
Closed Oct 31 at 8:12 PM by lukew
Verified Fixed

comments

lukew wrote Aug 20 at 12:48 AM

Notes for Triage: I've done some investigation on this issue and I have a repro project. This looks like a product bug in that, as stated above, the UseDbNullSemantics flag isn't honored.

When I stepped into Query Generation the flag wasn't passed all the way through and a default value of UseDbNullSemantics = false was used

moozzyk wrote Aug 20 at 5:01 PM

I believe it was a conscious decision. If you concat NULL with anything the result is always NULL e.g.: SELECT NULL + 'Abc' returns NULL however if you do the same in C#: Console.Write(null + "Abc"); you will get 'Abc'. Since the original linq query is in C# the current behavior is actually what the code would do if it was a Linq to Objects query (or just pure concat). The previous behavior did what was really not expected. If you want the old behavior you can be explicit about this in your query. I think something like this should do the trick:

string_concat_field = (lkp.lkp_name_cd == null || lkp.dspl_desc == null) ? null : lkp.lkp_name_cd + " - " + lkp.dspl_desc

brebok wrote Aug 20 at 6:17 PM

I understand what you are saying. However, the purpose of the UseDbNullSemantics flag is to control this behavior. If it is set to true, it should try to concatenate NULL and 'Abc' together as a database would, which will return NULL. If you want it to behave like a C# in-memory LINQ query, set the flag to false.

As it stands, this is a breaking change in behavior, as we had many queries that relied on the previous behavior using EF5. I understand that you can use the syntax in the LINQ query to accomplish the same thing, but this creates a more complex Expression Tree for the query and can significantly slow down initial query compilation time if there are many of these expressions in the select clause. We already have queries that are complex and unacceptably slow for first-time compilation. I don't want to slow them down anymore.

emilcicos wrote Aug 20 at 8:51 PM

This can probably be fixed in StringTranslatorUtil.StripNull (apparently added after EF 5), by not creating the CASE statement in case of DB null semantics. Also we could look and try to determine why the expression generated from "x != null && y != null ? x + y : null" does not get simplified by the transformation rules.

emilcicos wrote Aug 20 at 9:00 PM

The current UseDatabaseNullSemantics flag is basically the negation of the EF5 UseCSharpNullComparisonBehavior and it should honored only for equality comparisons '==', '!='. One of the reasons it was renamed to UseDatabaseNullSemantics (instead of ...ComparisonSemantics) was that in the future it could be honored for cases other than equality comparisons.

emilcicos wrote Sep 12 at 11:37 PM

commit 2bae908ad3f457eaee6cb2f7e2982b39580991d4

lukew wrote Sep 22 at 9:16 PM

This bug doesn't appear to be completely fixed. While trying to reproduce the customer's query I created a query that wouldn't compile with UseDbNullSemantics set to true but does work when set to false. Both tests below should pass.
    public class CodePlex2457 : FunctionalTestBase
        public class Customer
        {
            public int Id { get; set; }
            public string FirstName { get; set; }
            public string LastName { get; set; }
            public int ContactInfoId { get; set; }
        }

        public class ContactInfo
        {
            public int Id { get; set; }
            public int CustomerId { get; set; }
            public int PhoneNumber { get; set; }
            public string Address { get; set; }
        }

        public class Context : DbContext
        {
            static Context()
            {
                Database.SetInitializer<Context>(null);
            }

            public DbSet<Customer> Customers { get; set; }
            public DbSet<ContactInfo> ContactInfos { get; set; }
        }

        [Fact]
        public void Concatenated_strings_use_CSharpNullSemantics_for_table_joins()
        {
            using (var context = new Context())
            {
                context.Configuration.UseDatabaseNullSemantics = false;

                var query = from r in context.Customers
                            join lkp in context.ContactInfos on r.ContactInfoId equals lkp.CustomerId into lkpOut
                            from lkp in lkpOut.DefaultIfEmpty()
                            select new { r.FirstName, string_concat_field = lkp.PhoneNumber + " - " + lkp.Address };

                QueryTestHelpers.VerifyDbQuery(
                    query,
@"SELECT
    [Extent1].[ContactInfoId] AS [ContactInfoId],
    [Extent1].[FirstName] AS [FirstName],
    CASE WHEN([Extent2].[PhoneNumber]IS NULL) THEN N'' ELSE CAST([Extent2].[PhoneNumber] AS nvarchar(max)) END + N'-' + CASE WHEN([Extent2].[Address]IS NULL) THEN N'' ELSE[Extent2].[Address]END AS [C1]
    FROM[dbo].[Customers]AS[Extent1]
    LEFTOUTERJOIN [dbo].[ContactInfoes] AS [Extent2] ON [Extent1].[ContactInfoId] = [Extent2].[CustomerId]");
            }
        }
        
        [Fact]
        public void Concatenated_strings_use_SQLNullSemantics_for_table_joins()
        {
            using (var context = new Context())
            {
                context.Configuration.UseDatabaseNullSemantics = true;

                var query = from r in context.Customers
                            join lkp in context.ContactInfos on r.ContactInfoId equals lkp.CustomerId into lkpOut
                            from lkp in lkpOut.DefaultIfEmpty()
                            select new { r.FirstName, string_concat_field = lkp.PhoneNumber + " - " + lkp.Address };

                QueryTestHelpers.VerifyDbQuery(
                    query,
@"SELECT
    [Extent1].[ContactInfoId] AS [ContactInfoId],
    [Extent1].[FirstName] AS [FirstName],
    CAST([Extent2].[PhoneNumber] AS nvarchar(max)) END + N'-' + [Extent2].[Address]END AS [C1]
    FROM[dbo].[Customers]AS[Extent1]
    LEFTOUTERJOIN [dbo].[ContactInfoes] AS [Extent2] ON [Extent1].[ContactInfoId] = [Extent2].[CustomerId]");
            }
        }
    }

emilcicos wrote Sep 23 at 2:21 AM

That happens because of this:

lkp.PhoneNumber + " - " + lkp.Address

PhoneNumber is an int. C# allows that syntax, and converts the integer to a string. SQL (at least on SqlServer) does not allow that. Something like:

select PhoneNumber + 'abc' from ContanctInfo

fails with:

Conversion failed when converting the varchar value 'abc' to data type int.

Considering that we have DatabaseNullSemantics = true, this can be considered by design.

Arguably we can probably try to do something better, but I am not sure it is worth spending the time, because the issue can be worked-around by using SqlFunctions.StringConvert.

emilcicos wrote Sep 23 at 8:03 PM

Actually Luke's find is a good catch. The fix has a typo: "return inputExpression" instead of "return outputExpression", and that is causing the issue he is seeing.

lukew wrote Sep 26 at 6:35 PM

Fixed in d91a484c79c3 switching return value in one case to the output expression as opposed to input expression