February 2008 - Posts
I blogged a while ago about how I'm not comfortable with using a base class that all of your pages inherit from in ASP.Net. In .net 1.x this was really the only way to enforce behaviour and share functionality but it meant all of the developers on the team had to remember to do it, and it was a manual task. There are ways around this, like overriding the default templates in Visual Studio.
Of course, you can still choose to use a base page class in ASP.Net 3.5, one that inherits from System.Web.UI.Page, and then change all of your code-behind classes to inherit from it. From an OO perspective, that is definitely cleaner, but I think the approach below is less invasive approach in the web development lifecycle.
Part of this approach is to use PageAdapters, and I covered this in my post Injecting a Page Base Class in ASP.Net. This approach worked for some things, but sometimes you want to just offer shared functionality, and in ASP.Net 2.0 this still really meant a base page. In ASP.Net 3.5, we have now have Extension methods, and we can use these by creating extension methods that apply to System.Web.UI.Page, kind of like the following:
using System.Web.UI;
namespace TestExtensions
{
public static class PageExtensions
{
public static void WriteHelloWorld(this Page page)
{
page.Response.Write("Hello From Partial <br/>");
}
}
}
Then we can just use these methods directly in our pages, like so:
namespace TestExtensions
{
public partial class _Default : ActualBasePage
{
protected void Page_Load(object sender, EventArgs e)
{
this.WriteHelloWorld();
}
}
}
(by the way, thanks Craig for reminding me about this and putting it to good use :->)
When you add a new class to a project in C#, by default the class is declared as internal (well, to be clear, it doesn't set an access modifier, which means it defaults to internal), as below:
namespace ClassLibrary1
{
class Class1
{
}
}
However, most of the time I want my classes to be public. In fact, increasingly I want my classes to be partial as well, as follows:
namespace ClassLibrary1
{
public partial class Class1
{
}
}
So, to change the default behaviour, I edited the default C# template for the 'class' item. If you'd like to use my new version, you can download it here. Usually, when you add a new template to Visual Studio, you only have to drop it into My Documents\[Visual Studio version folder]\Templates\ItemTemplates, e.g. My Documents\Visual Studio 2008\Templates\ItemTemplates. However, because I'm overriding the default 'class' template, and I still want the Add .. New Class menu item to be hooked up correctly, I dropped the class.zip file into
C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplates\CSharp\Code\1033
(if you're not running a 64-bit OS, you can drop off the '(x86)' part of the folder name. If the template doesn't immediately become available, you might need to drop it into
C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplatesCache\CSharp\Code\1033
In addition, if it doesn't let you change the file in the first folder, you will need to shut down Visual Studio because it locks the folder.
I was trying to load a solution which contains a web project in with Visual Studio 2008 and I was getting a very helpful error:
System.Runtime.InteropServices.COMException, as below:
Of course, this is a very helpful error which told me exactly what I needed to know. I then tried to open just the web project on it's on with Visual Studio, and I finally got a better error message, telling me that it "is configured to use IIS".
I could have configured it to use IIS, I guess, but I'm happier running with the development web developer server anyway. What I had to do was change the proj file's xml by loading it in notepad, and changing the setting at the bottom:
<ProjectExtensions>
<VisualStudio>
<FlavorProperties GUID="{[whatever guid]}">
<WebProjectProperties>
<UseIIS>True</UseIIS>
<AutoAssignPort>True</AutoAssignPort>
<DevelopmentServerPort>61521</DevelopmentServerPort>
<DevelopmentServerVPath>/</DevelopmentServerVPath>
<IISUrl>http://localhost/MyWebsite</IISUrl>
<NTLMAuthentication>False</NTLMAuthentication>
<SaveServerSettingsInUserFile>False</SaveServerSettingsInUserFile>
</WebProjectProperties>
</FlavorProperties>
</VisualStudio>
</ProjectExtensions>
You need to change the "UseIIS" setting to False. Then save the project and reopen it, and all works fine.
A few days ago I put up a list of Getting Started with LINQ To SQL Links. Chris Rock (not the comedian) has another good list up as well.
[This is part 2 of an ongoing series. View Part 1 - Performance Considerations]
A short while ago I posted about the fact that I'm currently working on a real-world enterprise project using LINQ to SQL. I mentioned that the choice of LINQ To SQL was based on a number of criteria (in priority order):
Performance Security Development Time and Effort Maintenance Time and Effort Stability In the previous post, we discussed the first topic - performance considerations in LINQ To SQL - in some detail. In this post we will continue this evaluation by looking into the topic of security in terms of this new OR/M. As before, we will break down the discussion into a number of specific sub topics.
A note before we begin: Security is a critical topic, as we're all used to hearing about. However, recall that our focus is on LINQ To SQL, and the impact it may or may not have on security, specifically in an ASP.Net context. While the topics are important, we will not discuss topics like Cross Site Scripting, Cross Site Request Forgery, AJAX Security (the last link, for those who're interested, is to my DevDays 2007 slide deck on ASP.NET AJAX Security). If you're interested in those topics, there are some great links available (1) (2). In terms of LINQ to SQL security though, it's only fair to compare apples with apples, so to speak, so we need to bare in mind we're dealing only with data access layer security.
Stored Procedures
We had quite an intense look into performance related issues around Stored Procedures in the last post because LINQ To SQL favours inline (generated) SQL instead. Recall that LINQ To SQL can easily make use of stored procedures for queries and for updates.. Incidentally, one point I intended to mention on the topic is that, in terms of the argument that DBAs can tweak sprocs to improve DB performance, the truth is that arguably the primary and most common gain in performance that a DBA does most regularly relates to maintaining and improving indexing on the database, not managing CRUD sprocs. Sprocs are coming up here again because probably the second most common argument in favour of sprocs is the supposed security they provide. As Jeff Atwood puts it (in a blog post NOT in favour of stored procs, he's just putting forward the arguments we're all used to hearing):
Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
This is one of the two primary security arguments put forward in favour of sprocs. The other relates to parameterised SQL queries (effectively overcoming a SQL injection attack). However, without rehashing another old topic too much, this is completely overcome (from a database CRUD perspective at least) with simply using parameterised queries, so I'm not even going to give this point any more airtime.
Back to the "individually securing areas of the database" with sprocs argument. Now, in my mind this is more theoretical because I find in practice it holds little water. On the one hand, I quite commonly will hear clients espouse this concern as a reason to use sprocs, then simply go and use an 'sa' or equivalent account for the data access anyway! In addition, the majority of the systems I've seen with sprocs follow something like the following. The have a table (e.g. Customers), then have at least the following sprocs, following a very strictly defined naming convention:
- spCustomer_Insert
- spCustomer_Update
- spCustomer_Delete
- spCustomer_GetByID
- spCustomer_GetAll
- spCustomer_Get... (numerous of these)
Now, if an attacker was able to get hold of the text of your inline SQL calls, and from there possibly derive your schema and wreak havoc, they could just as easily then see the name or names of these sprocs and quickly derive your common naming convention! This will pretty much give him the same access if he would have had anyway! If you're concerned about internal security (i.e. developers not being able to access the tables directly) the situation is even worse for the same reason. I've been there working on an emergency late at night with NO TABLE ACCESS because we were "NOT allowed to be modifying the database without approval" and I simply called the correct sprocs in the correct order (don't worry, I had approval, we just couldn't get hold of the DBA). If you are legitimately worried about specific operations on specific tables (which you should be), you can just as easily create an account and set the specific CRUD rights on the table. With proper user schema separation in SQL 2005 this allows you to lock down to specific rows and specific column-level security, really giving the granularity that was argued from a sproc perspective.
SQL Connection
Once again, this is not a LINQ To SQL-specific issue, but please remember to use a secure database connection string, regardless of your data access technique. If you're not sure how easily it is to do this, you can check out my How Do I: Encrypt My Web.Config File? video on MSDN. Again, please take this as a very important point. It's so important, in fact, that it is (laughably) the only point that Microsoft lists in the MSDN library under Security in LINQ to SQL. To quote 50% of the "article":
Security risks are always present when you connect to a database, especially if the password is clearly readable in the connection string.
Application Architecture Security Concerns
As I've mentioned a few times above, the issues we've discussed above under "LINQ To SQL" security have really been around in various data access approaches and OR/Ms. Arguably, we now come to something more unique to LINQ To SQL. The nature of how LINQ To SQL implements its classes and the datacontext, because of how the validation works and also because they are your business classes, means that you are forced to mix certain aspects of your layering more closely than might otherwise be the case. For instance, if you look at the Product class generated by the LINQ To SQL Designer, you will find the following code shortly after the class definition:
#region Extensibility Method Definitions
partial void OnLoaded();
partial void OnValidate(System.Data.Linq.ChangeAction action);
partial void OnCreated();
partial void OnProductIDChanging(int value);
partial void OnProductIDChanged();
partial void OnProductNameChanging(string value);
partial void OnProductNameChanged();
partial void OnSupplierIDChanging(System.Nullable<int> value);
partial void OnSupplierIDChanged();
partial void OnCategoryIDChanging(System.Nullable<int> value);
partial void OnCategoryIDChanged();
partial void OnQuantityPerUnitChanging(string value);
partial void OnQuantityPerUnitChanged();
partial void OnUnitPriceChanging(System.Nullable<decimal> value);
partial void OnUnitPriceChanged();
partial void OnUnitsInStockChanging(System.Nullable<short> value);
partial void OnUnitsInStockChanged();
partial void OnUnitsOnOrderChanging(System.Nullable<short> value);
partial void OnUnitsOnOrderChanged();
partial void OnReorderLevelChanging(System.Nullable<short> value);
partial void OnReorderLevelChanged();
partial void OnDiscontinuedChanging(bool value);
partial void OnDiscontinuedChanged();
#endregion
This code defines the partial methods that will be called at appropriate times during the lifetime of your object. For instance, the OnLoaded gets called when the object is loaded. Similarly, OnValidate gets called when an object is going to be saved to the database. The matching On[Property]Changing and On[Property]Changed methods of course get called during the set of these properties. Aside from the individual tables, the datacontext itself has various methods defined:#region Extensibility GetHealthy Definitions
partial void OnCreated();
partial void InsertCategory(Category instance);
partial void UpdateCategory(Category instance);
partial void DeleteCategory(Category instance);
partial void InsertProduct(Product instance);
partial void UpdateProduct(Product instance);
partial void DeleteProduct(Product instance);
partial void InsertCustomer(Customer instance);
partial void UpdateCustomer(Customer instance);
partial void DeleteCustomer(Customer instance);
partial void InsertOrder(Order instance);
partial void UpdateOrder(Order instance);
partial void DeleteOrder(Order instance);
partial void InsertOrder_Detail(Order_Detail instance);
partial void UpdateOrder_Detail(Order_Detail instance);
partial void DeleteOrder_Detail(Order_Detail instance);
#endregion
In order to get these methods working, you need to put the code into partial classes that exist in the same project. For the Product class, this means it must be in the same project as the Product class definition created on the designer. An example of this appears below:using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace WhyLinQ
{
public partial class Product
{
partial void OnValidate(System.Data.Linq.ChangeAction action)
{
//do some validation logic here
}
}
}
This is turn means that the validation code, which is part of the business logic, needs to sit in the same layer as the database access code. From a security perspective, this limits how these two concerns can be physically separated. In my experience, separating the business and data layers is rarely implemented, certainly not for security reasons. It is far more common to split the business and UI layers, for instance behind a service, so this concern is perhaps not a big one.
With that we've covered the 2nd primary concern in evaluating LINQ To SQL. Next up, and time-permitting, I'll be looking into what concerns we had relating to developer productivity and effort, both in terms of using LINQ as well as in terms of the ramp up it requires.
Considering I've put a few posts up about LINQ To SQL, I realised I've never shared some of the initial links I passed to other team members as well. If you're looking into using this technology, here are some worthwhile links to follow. Some are tutorials, some are tools, some are just worthwhile pieces to understand with this new technology. Also, some are more general LINQ-related than LINQ To SQL specifically.
Essential Reading
Essential Tools
Worthwhile Additional Info
I found a nice series of tips for releasing ASP.Net apps to production via Scott Guthrie's links blog post. It's definitely a worthwhile read if you're working with ASP.Net
For some follow up, Scott Mitchell has a great post on the App_offline tip (tip 7) with even more details than Scott Guthrie's post. Also, if you're interested in seeing easily how to encrypt your web.config, check my How Do I: Encrypt My Web.Config File? video
A good while ago I posted about using Path.Combine to combine two paths into 1 final path. For instance, if you have a folder portion you've extracted from somewhere (maybe a root path for something you're storing in the config file, like 'c:\temp') and a final filename (like 'myfile.txt', Path.Combine("c:\temp", "myfile.txt") will join the two for a resulting filename: c:\temp\myfile.txt
There are a few interesting combinations of what will and won't combine correctly, and the MSDN page for Path.Combine explains some of these. There's one condition it won't cater for properly though - if the 2nd parameter contains a leading '\', for instance '\myfile.txt', the final result will ignore the first parameter. The output of Path.Combine("c:\temp", "\myfile.txt") is \myfile.txt'. This is not the case if the 1st parameter contains a trailing '\'. See the following for more info:
string part1 = @"c:\temp";
string part2 = @"assembly1.dll";
(1) Console.WriteLine(Path.Combine(part1, part2));
(2) Console.WriteLine(Path.Combine(part1 + @"\", part2));
(3) Console.WriteLine(Path.Combine(part1 + @"\", @"\" + part2));
(4) Console.WriteLine(Path.Combine(part1, @"\" + part2));
The output of this is
(1) c:\temp\assembly1.dll
(2) c:\temp\assembly1.dll
(3) \assembly1.dll
(4) \assembly1.dll
So the moral is - check your 2nd path for a leading '\'.
I've just finished working on a new How Do I video for Visual Studio Extensibility on the topic of T4 Code Generation [Update: the video is now live at How Do I: Create And Use T4 Templates?]. T4, or Text Template Transformation Toolkit, is the free code generation engine from Microsoft that underpins their Domain Specific Languages and Software Factories toolkits. T4 is usually used only to generate code from the models in your DSL, but it's a pretty rich code generation engine in it's own right. I've played a lot over the years with CodeSmith and MyGeneration, so I've been meaning to play with T4 for a while. What's nice about the new version is that it is now built into Visual Studio 2008 - no need for the SDK to get it any more.
If you're interested in finding out more about how T4 works, and to get some great samples, check out Oleg Sych's blog.
One thing I noticed when working with T4 though, is that while the ".tt" file extension works in Visual Studio, there is no item template when you Add .. New Item to your project. I've put 2 Item Templates together to overcome this - one for a simple T4 template and one for an SMO Database-driven template (thinking of replacing the code generator for LINQ To SQL?). You can download the simple T4 C# item template here and the T4 C# Database item template here. To use the templates, simply drop the files into [My Documents]\Visual Studio 2008\Templates\ItemTemplates\
By the way, if you're working with and creating T4 templates, make sure you check out the great free T4 Editor from Clarius Consulting.
In a recent post, I discussed using the LoadWith method on the DataLoadOptions in LINQ To SQL. This method allows you to explicitly specify which objects must be automatically retrieved from the database and it can make an important difference to the performance of your LINQ To SQL application.
Recall that the usage of LoadWith was something like this:
using (NorthwindDataContext dc = new NorthwindDataContext())
{
dc.Log = Console.Out;
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
dc.LoadOptions = dl;
var categories = from c in dc.Categories
select c;
LoadWith, as you can see, is a method on DataLoadOptions, a property on the DataContext. In addition to this useful method, in this post we're going to explore another handy one called AssociateWith and we'll see also how they can work together.
Let's refresh our memories and look at the domain again:

Using AssociateWith To Retrieve Only Specific Linked Values
While LoadWith and AssociatedWith are both methods on DataLoadOptions, they have quite different goals. LoadWith is used to Eager Load all children, whereas AssociateWith has no specific bearing on performance (at least not by design per se). Instead, AssociateWith is a means whereby we can constrain the child items that should come back for a parent based on a certain condition.
For instance, in our domain, assuming that whenever we retrieve the Products in a Category, and we want to do so by using the Products property on a Category instance, we only want Products with a UnitPrice greater than 12. This would let us write code like this:
using (NorthwindDataContext dc = new NorthwindDataContext())
{
dc.Log = Console.Out;
var categories = from c in dc.Categories
select c;
foreach (var category in categories)
{
sb.AppendLine(category.Products.Count.ToString());
}
}
but have this code automatically filter out those products cheaper than 12 (insert your currency value :->). We can do this by using the AssociateWith DataLoadOption, as follows:
using (NorthwindDataContext dc = new NorthwindDataContext())
{
dc.Log = Console.Out;
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.AssociateWith<Category>(c => c.Products.Where(p => p.UnitPrice > 12));
dc.LoadOptions = dl;
var categories = from c in dc.Categories
...
Initially, without the AssociateWith, here are our SQL results:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
]
FROM [dbo].[Categories] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
[... 8 of these in total, one per Category]
12
12
13
10
7
6
5
12
When we instead use the AssociateWith, we get the following appearing in our log and our results:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
]
FROM [dbo].[Categories] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[UnitPrice] > @p0) AND ([t0].[CategoryID] = ((
SELECT [t2].[CategoryID]
FROM (
SELECT TOP (1) [t1].[CategoryID]
FROM [dbo].[Categories] AS [t1]
WHERE [t1].[CategoryID] = @p1
) AS [t2]
)))
-- @p0: Input Decimal (Size = 0; Prec = 33; Scale = 4) [12]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[UnitPrice] > @p0) AND ([t0].[CategoryID] = ((
SELECT [t2].[CategoryID]
FROM (
SELECT TOP (1) [t1].[CategoryID]
FROM [dbo].[Categories] AS [t1]
WHERE [t1].[CategoryID] = @p1
) AS [t2]
)))
-- @p0: Input Decimal (Size = 0; Prec = 33; Scale = 4) [12]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[UnitPrice] > @p0) AND ([t0].[CategoryID] = ((
SELECT [t2].[CategoryID]
FROM (
SELECT TOP (1) [t1].[CategoryID]
FROM [dbo].[Categories] AS [t1]
WHERE [t1].[CategoryID] = @p1
) AS [t2]
)))
-- @p0: Input Decimal (Size = 0; Prec = 33; Scale = 4) [12]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [3]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE ([t0].[UnitPrice] > @p0) AND ([t0].[CategoryID] = ((
SELECT [t2].[CategoryID]
FROM (
SELECT TOP (1) [t1].[CategoryID]
FROM [dbo].[Categories] AS [t1]
WHERE [t1].[CategoryID] = @p1
) AS [t2]
)))
-- @p0: Input Decimal (Size = 0; Prec = 33; Scale = 4) [12]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [4]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
[... 8 of these in total, one per Category]
10
11
10
9
5
5
4
8
So you can see we've narrowed down our results count, and we're just returning the Products that match our constraint. We can see in the query that it is explicitly looking for products where UnitPrice is > the @p0 parameter, and the value for this parameter is listed at the bottom of each query, as [12].
Other Uses Of AssociateWith
The help for AssociateWith lists info on what types it accepts as the expression property, and it lists this as follows:
// Identifies the query to be used on a particular one-to-many field or property.
// Note the following:If the expression does not start with a field or property
// that represents a one-to-many relationship, an exception is thrown.If an
// operator other than a valid operator appears in the expression, an exception
// is thrown. Valid operators are as follows:WhereOrderByThenByOrderByDescendingThenByDescendingTake
So essentially don't pass invalid operators, and only pass one-to-many relationships. Got it. What's interesting, if you can make out the BadlyPunctuatedOptionsAtTheEnd, is that there are a variety of operators we can call on AssociateWith aside from a Where. It seems that we can use both Ordering and the Take operator. For the former, this means that when we retrieve the products in a category, we can automatically order then by a specific value. For instance, we could write
dl.AssociateWith<Category>(c => c.Products.OrderBy(p => p.ReorderLevel));
For the Take operator, it means we can be explicit about the maximum number of items of the linked child type to return. I can specify, for instance, to only return 3 Products per Category:
dl.AssociateWith<Category>(c => c.Products.Take(3));
Because these are just standard LINQ query operators, we can even combine them, and Take the first 3 Products Ordered By ReorderLevel:
dl.AssociateWith<Category>(c => c.Products.OrderBy(p => p.ReorderLevel).Take(3));
Combining LoadWith and AssociateWith
AssociateWith, we can see, is quite a handy feature to know about. However, you might have noticed in the examples above that it's Lazy Loading! The next logical question, of course, is is it worthwhile to combine them? Anders Hejlsberg says no:
"LoadWith and AssociateWith actually do different things and it isn't really feasible to combine them. Specifically, LoadWith specifies the relationships you want to eager load with entities of a particular type, and AssociateWith specifies filters and/or orderings for particular relationships--regardless of whether those relationships are eager or lazy loaded."
But what does he know, after all <g>? I think he might have meant that it's important to understand the differences and that they are targeted at achieving different things. Let's have a look!
If we look at combining our first sample, we have the following code:
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
dl.AssociateWith<Category>(c => c.Products.Where(p => p.UnitPrice > 12));
dc.LoadOptions = dl;
When we do this, here's the SQL that is generated:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
], [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID] AS
[CategoryID2], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [
t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued], (
SELECT COUNT(*)
FROM [dbo].[Products] AS [t2]
WHERE ([t2].[UnitPrice] > @p0) AND ([t2].[CategoryID] = ([t0].[CategoryID]))
) AS [value]
FROM [dbo].[Categories] AS [t0]
LEFT OUTER JOIN [dbo].[Products] AS [t1] ON ([t1].[UnitPrice] > @p0) AND ([t1].[
CategoryID] = ([t0].[CategoryID]))
ORDER BY [t0].[CategoryID], [t1].[ProductID]
-- @p0: Input Decimal (Size = 0; Prec = 33; Scale = 4) [12]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
Interesting, our query is now doing Eager Loading and only returning the products by constraint. We can start to get even smarter, e.g. by trying to combine the LoadWith with a more complex AssociateWith:
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
dl.AssociateWith<Category>(c => c.Products.OrderBy(p => p.UnitPrice));
dc.LoadOptions = dl;
This time, we're trying to retrieve all Categories, and then get their Products ordered by price automatically. Here's our SQL:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
], [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID] AS
[CategoryID2], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [
t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued], (
SELECT COUNT(*)
FROM [dbo].[Products] AS [t2]
WHERE [t2].[CategoryID] = ([t0].[CategoryID])
) AS [value]
FROM [dbo].[Categories] AS [t0]
LEFT OUTER JOIN [dbo].[Products] AS [t1] ON [t1].[CategoryID] = ([t0].[CategoryI
D])
ORDER BY [t0].[CategoryID], [t1].[UnitPrice], [t1].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
So we're still doing ok. What about with the Take? The syntax is easy enough:
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
dl.AssociateWith<Category>(c => c.Products.OrderBy(p => p.UnitPrice).Take(3));
dc.LoadOptions = dl;
But here's our SQL:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
]
FROM [dbo].[Categories] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT TOP (3) [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[Ca
tegoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[
UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = (@x1)
ORDER BY [t0].[UnitPrice]
-- @x1: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT TOP (3) [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[Ca
tegoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[
UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = (@x1)
ORDER BY [t0].[UnitPrice]
-- @x1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
[... 8 of these again...]
And we're back to Lazy Loading. So, it seems that LoadWith and AssociateWith can be used together, but with a big caveat - it depends what you're trying to AssociateWith! This makes sense, of course, if you think about the SQL that is needed underneath. In fact, it's pretty impressive that LINQ To SQL knows what it can and can't do, and that it does try to optimise where it can. However, it appears that Anders was right once again, and that you might not be able to depend on these two working together depending on what you're supplying them. Where you're not using Take, though, it's a pretty powerful combo.
When using an OR/M like LINQ To SQL, it is vitally important in almost all instances to understand the conception of lazy loading - what it means, what it does, and how to best work with it.
In short, Lazy Loading is the concept of the data only being retrieved from the database when you want to use it. For instance, you can load the customers you want, work with them, e.g.:
var customers = from c in dc.Customers
select c;
Console.WriteLine(customers.First().CompanyName);
and then, when you want a column (basically another property on your class) or even an entire child collection that you didn't load initially, e.g.
foreach (var order in customers.First().Orders)
{
Console.WriteLine(order.OrderDate);
}
LINQ will automatically go back to the database and fetch all the orders for that customer. Sounds great, hey? In fact, if you're writing a tiny mobile app with a SqlCE database embedded in memory, that's a great thing as it speeds up coding nicely. HOWEVER, most of us are writing multi-user, distributed systems. At 'best' we're writing a smart client with a bunch of code in the same context that at least still maintains knowledge of a link to the database. In these cases, we're in line for a potentially serious performance implication as these multiple, unknown, unspecified (at least not explicitly) queries are firing. If you're new to Lazy Loading, you're not even likely to be aware it's happening. At 'worst' we're already over the wire at the client end of our web service, so lazy loading is no longer even an option and the code simply won't function as expected (maybe the 'best' and 'worst' should be switched around if you've got decent unit tests as the tests would work in the 1st instance, even though performance would be miserable).
Now, I think it's clear that I'm not a huge fan of Lazy Loading, by and large. Dinesh.Kulkarni, a senior guy on the LINQ To SQL team, seems to feel the same, and puts it well:
OK. So I have to admit I am not a fan of lazy loading - by any name - deferred loading, demand loading, just-in-time loading, blah loading. For me, it is all about including what you want and then shutting off the darned spigot that brings in the objects. No more lazy stuff coming in from who knows what state and who knows in what amounts!
His post is aptly titled: Lazy ORM users deserve it! <g>.
If I didn't make it clear above, it's important to note that LINQ to SQL not only supports but enables Lazy Loading by default. Now, this might not be a bad thing because assume for instance an object hierarchy with 6 - 7 levels of child objects. In fact, a 'lookup' table, something like CustomerType, usually ends up as a 'parent' entity because of the way the foreign keys are specified on the database. So, if you wanted to just bind these in a dropdown with name and value, you would be returning every object all the way down your hierarchy. Assume, for instance, 5 CustomerTypes, 1 million Customers, 5 Orders per Customer with 3 lines each, and we're suddenly getting a LOT of data. So, Lazy Loading turned on is not necessarily all that bad and this is really the goal of having it as a feature.
However, I can only agree with this if it can be turned off and on very easily. Ideally, I like the ability to turn it on and off easily for specific scenarios, too. Not all OR/M frameworks do this, but thankfully LINQ To SQL does allow you to specify when to turn off this option (i.e. when to 'eager' load) using the DeferredLoadingEnabled property on the DataContext and specifically what to include. In addition, specifying what to include can involve not just child tables, but those based on specific conditions as well (e.g. Deleted flag = false). This is done using the DataLoadOptions property on the DataContext. Specifically, it lets you use the LoadWith and AssociateWith methods, and we'll be exploring how these work and how they can be used, either alone or together.
Using LoadWith To Eager Load
For the examples we're going to explore, this is the domain we're working with:
So it's basically just the Categories and Products tables from Northwind, and we'll use something like the following code to evaluate the database queries:
StringBuilder sb = new StringBuilder();
using (NorthwindDataContext dc = new NorthwindDataContext())
{
dc.Log = Console.Out;
var categories = from c in dc.Categories
select c;
decimal categoryTotal;
foreach (var category in categories)
{
categoryTotal = 0;
// another sql call per category(!)
foreach (var product in category.Products)
{
if (product.UnitPrice.HasValue)
categoryTotal += product.UnitPrice.Value;
}
sb.AppendLine(categoryTotal.ToString());
}
}
Console.WriteLine();
Console.WriteLine(sb.ToString());
Console.ReadLine();
The code queries for the categories in the Categories table, then loops through these and loops through each product in a nested For loop, adding up all the totals. I write the totals out at the end in the StringBuilder because I want to group all the outputted SQL together and also because we'll want to look at these later when we use AssociateWith. the DataContext.Log allows you to specify a location to log all of the SQL that will run against the database. The property just takes in a TextWriter, and I've simply set it to Console.Out in this console app to see the SQL code that will execute written out to the console. If you wanted to do some verbose logging at run time, you can store this text, perhaps into a tracing database. Here's an implementation of a sample to capture this text.
You can see from the comment I've added that I'm expecting multiple calls every time the products are looped through, and the output console clearly shows this:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
]
FROM [dbo].[Categories] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID
], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnO
rder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
[... 8 of these in total, one per Category]
455.7500
276.7500
327.0800
287.3000
141.7500
324.0400
161.8500
248.1900
Now, I want to turn this Lazy Loading off, because I know I want to return all of the products at the same time for my categories. To do this, I make the following modification before retrieving the categories:
using (NorthwindDataContext dc = new NorthwindDataContext())
{
dc.Log = Console.Out;
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
dc.LoadOptions = dl;
var categories = from c in dc.Categories
...
Now, I execute the program again, with no other changes, and we can see the change in the database calls:
SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture
], [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID] AS
[CategoryID2], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [
t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued], (
SELECT COUNT(*)
FROM [dbo].[Products] AS [t2]
WHERE [t2].[CategoryID] = [t0].[CategoryID]
) AS [value]
FROM [dbo].[Categories] AS [t0]
LEFT OUTER JOIN [dbo].[Products] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID
]
ORDER BY [t0].[CategoryID], [t1].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8
455.7500
276.7500
327.0800
287.3000
141.7500
324.0400
161.8500
248.1900
So, we've gone from 9 (1 for categories + 8 for each group of products) database calls to 1. Think about the impact of this on a list of 100 or 1000 items, and it gets quite dramatic! In short, I recommend using LoadWith often. You can even specify these DataLoadOptions at multiple levels. For instance, for Customers with Orders, where the Orders have associated OrderDetails, you can load them all together when you need them as follows:
dl.LoadWith<Customer>(c => c.Orders);
dl.LoadWith<Order>(o => o.Order_Details);
Reusing DataLoadOptions
One concern you might have is that you don't want to specify these options all the time by hand, and would rather be able to group them once. To do this, simply place these into a method in a partial class for the DataContext. Right click on the designer in the DataContext and select View Code, then add the following into the new partial class:
namespace LinqEagerLoading
{
partial class NorthwindDataContext
{
public void AddSomeSpecificEagerLoadOptions()
{
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
this.LoadOptions = dl;
}
}
}
Then you just call this method before your first call to get the categories. If you decide you always want these to load explicity, you can even set this in the partial OnCreated method which is called by all of the overloaded constructors for the DataContext:
namespace LinqEagerLoading
{
partial class NorthwindDataContext
{
partial void OnCreated()
{
System.Data.Linq.DataLoadOptions dl = new System.Data.Linq.DataLoadOptions();
dl.LoadWith<Category>(c => c.Products);
this.LoadOptions = dl;
}
}
}
If you prefer having the options as a method, but want to have multiple versions of these based on certain conditions, you can create multiple methods, or one method that takes in a parameter to define what to load at the time. Either way, it's clear this DataLoadOptions property is an important aspect in the Lazy Loading discussion.
Steve Herbert has an interesting performance evaluation discussion up on his blog (this is raw LINQ, by the way, not LINQ to SQL). Initially, he has 3 samples, the first 2 using LINQ and the 3rd without. These are the results:
Item#1 – averaged 2643 ms
Item#2 – averaged 671 ms
Item#3 – averaged <1 ms
However, make sure to read through the comments on the post. What emerges is from Eric Lee who comments:
"So the lesson to learn here is that when you use Linq, you either need to make sure your container implements IQueryable or be prepared for IEnumerable-based algorithm performance ... How could Linq possibly know how to take advantage of the quirks of every conceivable data structure? Obviously it can't. It has its own strategy that works against anything that can manage to implement IEnumerable, and it can take advantage of custom strategies if they're provided via IQueryable."
The list of structures and frameworks that implement IQueryable include LINQ To SQL, LINQ To Datasets, LINQ To XML, LINQ To Amazon, LINQ To SharePoint, LINQ To Active Directory, LINQ To Flickr, LINQ To NHibernate, LINQ To REST, even an attempt at LINQ To Google Desktop and I'm sure there are more coming.
For a discussion on LINQ To SQL performance, see my post on this.
I just got a mail from someone who had been trying to contact me saying that the Contact page on my blog doesn't seem to be functioning. I'm investigating the issue, but in the meantime please feel free to get hold of me at hilton thAT giesenow thD.T com