Linq-to-Sql - what are the implications?
Craig has been posting on the topic a bit lately, and innitially I wanted to respond to his latest post with a comment; it got so long that it warranted a post on itself.
For those uninitiated ones, Linq-to-Sql is what used to be called DLinq - this will replace the ADO.NET code we currently use to access data in a relational database. It's important to note that Linq-to-Sql only supports SQL Server as a database. There are many reasons for this, one being that SQL code is generated by the framework, and to optimise data-access, Linq will generate T-SQL. (Other vendors, Oracle for example, can create a Linq-to-Oracle library if they wish - here's an example of an open-source implementation for MySql).
Just as Craig, I was also quite sceptical about LINQ when I heard about it at first - the intermingling of queries with my code reminded me a lot of Foxpro, which (no offense!) really isn't my cup of tea. I was pleasantly surprised when I investigated LINQ and realised how brilliant its OO design is. Craig said:
I am a firm believer in the current database development best practice of providing a stored procedure interface to a database to provide a layer of abstraction, increase maintainability and allow for performance improvement through database query optimisation and tweaking. Sometimes this includes blurring the ideals of having the business logic implemented completely outside of the database in the hypothetical business logic layer or object workspace.
I completely agree with him. Actually, the "blurring" of boundaries betweeb data-layers and business-layers is a major problem - we all know, when using stored procedures (sprocs) (as great as the set-based advantage is), how difficult it is to keep business logic out of SQL code. Before you know it, your sprocs are littered with complex logic that gets difficult to maintain, and completely invisible to calling code. I know it doesn't have to happen, but it inevitably does in reality. To sum up my feelings: Linq will change the way we programmatically interact with data for (I am tempted to say for EVER EVER) at least the next few years, and I think other programming domains (java?!) will catch onto this idea eventually.
The use of stored procs for CRUD operations has been advocated by Microsoft for many years, and, they were in fact wrong. It will be difficult (and interesting to see) for them to try and correct this perception from now onwards!
Designing Queries:
Linq overcomes these problems completely. First of all, if you *really* prefer to use stored procedures, you can easily do so. Here's a screenshot of the Dlinq
designer - note that you can specify stored procedures, or even your own command-text, for each CRUD operation.
These entities are created from the old Northwind database, simply by dragging tables from the Server Explorer panel onto the designer surface.
Of course, this designer diagram represents auto-generated code, which is updated in the background as you tweak this visual representation. Furthermore, the generated classes are partial, so if you wanted to specify your own behaviour (like a GetMostValuedCustomers method) on any of these classes, you can do so in your own seperate partial class, so as not to have it overwritten by the designer. The designer generates a DataContext, as well as the Entity classes seen here. It also generates the relationship depicted.
Methods can also be auto-generated by dragging sprocs onto the designer surface - this gives you the immediate advantage of strong-typed calling without having to go do the plumbing of converting SqlDbTypes to .net types.
So I can query my Linq object model as follows:
1: Northwind northwind = new Northwind();
2: {
3: var customerProjections = from c in northwind.Customers
4: where c.City == "London"
5: orderby c.Orders.Count descending
6: select new { Name = c.ContactName, City = c.City, NumberOfOrders = c.Orders.Count };
7:
8: if(someBooleanExpression)
9: {
10: foreach(var customerProjection in customerProjections)
11: {
12: //Do something with customerProjection
13: }
14: }
15: }
The actual query that gets generated and sent to SQL Server by this code looks as follows:
1: SELECT [t0].[ContactName], [t0].[City], (
2: SELECT COUNT(*)
3: FROM [Orders] AS [t2]
4: WHERE [t2].[CustomerID] = [t0].[CustomerID]
5: ) AS [NumberOfOrders]
6: FROM [Customers] AS [t0]
7: WHERE [t0].[City] = 'London'
8: ORDER BY (
9: SELECT COUNT(*)
10: FROM [Orders] AS [t1]
11: WHERE [t1].[CustomerID] = [t0].[CustomerID]
12: ) DESC
Performance
Firstly, a Linq data query only executes as soon as its members are accessed. This means that only in line 11 in the C# code above, will a SQL Connection be opened and the listed TSQL be executed upon the database. The SQL Connection is closed immediately when leaving the scope of the statement that caused it to be opened - this means line 15 in our C# listing. (This behaviour is known as deferred execution).
Deferred execution allows you to reuse your queries - each time you access members of the query, the query will actually be refreshed from the database, which means that you're never working with stale instances, decreasing the possiblity of concurrency issues. (Of course you have control over this, and the roundtrip doesn't have to happen every time).
Identity
The DataContext object has a tracking service that tracks identity for each object. This means that, when your LINQ query retrieves an object that has the same conceptual identity as an object that is already in memory, a pointer to this object is returned. For example: if I retrieve all customers in London, I might end up with a list that includes a customer with ID '4'. If I now issue a query for all customers with ID '4', I will get a list with an object that points to the same instance in memory as customer '4' in my first result.
Linq will throw a OptimisticConcurrencyException in case of a concurrency violation, without losing any of the in-memory changes to the object model that you tried to persist. This enables you to react appropriately:
1: while (retries < maxRetries)
2: {
3: // fetch objects and make changes here
4: try
5: {
6: northwind.SubmitChanges();
7: break;
8: }
9: catch (OptimisticConcurrencyException e)
10: {
11: retries++;
12: }
13: }
More observations
Linq also supports SQL User-defined Functions (UDFs), stored procedures, database creation (great at installation-time), and more. No valuable performance figures are available at this stage as the product is still changing constantly.
Just to end off with, I thought I'd post a preview of what will be possible with Linq; data queries are purely conceptual, and completely removed from the data's representation. The following statement selects data from a database, and transforms it into an XML document (note how easy it is to visualize the resulting XML document from the Linq-to-XML syntax):
1: var e = new XElement(“results”,
2: from c in db.Customers
3: where c.City == “London”
4: select new XElement(“customer”,
5: new XElement(“name”, c.ContactName),
6: new XElement(“phone”, c.Phone)
I'll be posting more on the Linq Project shortly, as well as do some talks in Cape Town for sadeveloper.net - keep an eye on their site for more details.