Schema Information

With this work, the way schema information is retrieved from a provider will be drastically simplified. In addition to reverse engineering a database, new scenarios can also be enabled that leverage this functionality.

Goals

Providers are currently required to do some complex things in order to provide schema information about a database. The goal of this work is to simplify the way provider writers implement this functionality while minimizing the disruption to existing providers.

Current Limitations

The way a provider writer currently implements this functionality is by authoring SSDL and MSL files that, together with a predefined CSDL, allow conceptual-level queries to be translated into database-specific queries for database schema-related information. We have heard from a few provider writers that this is one area of the provider model that could be improved.

The CSDL, which is defined by the framework, is unnecessarily complex. It has multiple entity-sets per type (MEST) and a significant number of entity-sets, types, and properties are not used by the primary consumer of this functionality – Reverse Engineer.

Another aspect of the current implementation that complicates things is the assumption that databases can readily provide this information using only SQL queries. Oracle, for example, can provide this information using SQL; however, the shape of the data is significantly different and transforming it into the shape required by the CSDL is not a trivial task using just SQL. When the required information is not available via SQL, as is the case with SQLite, elaborate steps have to be taken to enable this functionality such as creating and populating tables on-the-fly when the information is requested.
Finally, the only way currently to consume this functionality is by constructing an EntityConnection object using the CSDL, MSL, and SSDL artifacts and then issue ESQL queries against it.

Design

A new method will be added to the provider:

public virtual SchemaInformation GetStoreSchema(DbConnection connection)

This method will return the schema information for a given database connection. Provider writers can override this method and return the schema information however is most appropriate for them. This could be by...
  • Issuing SQL queries and manually materializing the SchemaInformation object.
  • Returning a lazy SchemaInformation object that queries for information and materializes the results as it is requested.
  • Returning a SchemaInformation object that is backed by a DbContext object which handles querying and materialization for them.
  • Retrieving the information through some other means such as a database-specific API call.

SchemaInformation is an abstract class that includes three strongly-typed IQueryable properties: TablesAndViews, Routines, and Constraints.

The SQL Server and SQL Server Compact providers will use a DbContext-backed implementation of SchemaInformation. This implementation will be easily configurable by other providers and should be the first option considered when implementing the GetStoreSchema method.

The following is a simplified class diagram representing the object model exposed by the SchemaInformation class.

TablesAndViews.png
Routines.png
Constraints.png

The implementation of this model will be shared with the Customizable Code First Conventions feature.

Additional Scenarios

Using this revised provider functionality, other areas of the framework could be enhanced.

The CompatibleWithModel method on Database could be extended to check compatibility against the actual database schema if the model metadata is not available. This functionality would flow to other parts of the framework like the CreateDatabaseIfNotExists initializer.

A GetSchema method could be added to the Database class allowing users to query for schema information directly from the DbContext API.

API Usage

If we do expose a method on the Database class, retrieving schema information would be done as follows.

var schema = context.Database.GetSchema();

Here are some of the queries that could then be issued against the schema object.

// What tables/views are in the database?
var tableNames = schema.TablesAndViews.Select(t => t.Name);

// What TVFs are available?
var tvfs = schema.Routines.OfType<TableValuedFunction>();

// What tables reference Products?
var productsTable = schema.TablesAndViews.Single(t => t.Name == "Products");
var referencingTables = schema.Constraints.OfType<ForeignKeyConstraint>()
    .Where(fkc => fkc.ForeignKeys.Any(fk => productsTable.Columns.Contains(fk.ToColumn)))
    .Select(fkc => fkc.Parent);

Challenges

In order to maintain compatibility with existing providers, a default implementation of the GetStoreSchema method will be provided. This default implementation will return a SchemaInformation object that retrieves information via the previous CSDL, MSL, and SSDL mechanism.

To avoid exposing any bugs in a provider’s current implementation, the information will only be retrieved using queries that are familiar to the provider. These queries will be the same ones that are currently sent to the provider during reverse engineer.

Last edited Aug 10, 2012 at 1:07 AM by BriceLambson, version 2

Comments

danam Sep 19, 2012 at 2:58 AM 
How about having SchemaInformation represented as a DbContext directly and its object model as DbSet and entities classes?
That would feel quite natural and consistent besides opening up interesting scenarios.
The only challenge would be to represent the following attributes:
-ClrEquivalentType
-DefaultValue
-MaxValue
-MinValue
-Value
defined on:
-EdmProperty
-EnumMember
-Facet
-FacetDescription
-MetadataProperty
-PrimitiveType
as they're System.Type or Sytem.Object. But that should not be too hard to circumvent.