Coding Sanity

Emasculated vampire A chat with a work colleague opened my eyes to the reality that many programmers still use DataSets, and consider them powerful and useful tools. Many of my peers would react to the idea much like a vampire confronted by sunlight (a proper vampire, not these metrosexual emasculated snivellers popularised in the Twilight series), pulling back in horror and making a hissing noise before launching a full blooded attack at the throat. So why do many developers hate them so much, and why are they still popular?

The GoodThe Good

Datasets have history, they’ve been around since .NET 1.0 and have not changed since then. Code using them still works great, so why change it? Also, they do their job okay. They allow you to use your data in a mechanism familiar to most developers: tables, columns and relationships. Not only that; they also allow you to sync changes to back and forth to multiple back end stores. Finally they assist you in maintaining data consistency by handling most data concurrency issues for you. So, given all that, why would we not use DataSets?

The Ugly

The UglyThey’re quite slow and bloated, they have all this plumbing for handling concurrency and data versioning and serialization which is great, but normally you don’t need a lot of that stuff.

Regardless of the formatter being used, the DataSet always serializes first to XML. What's worse, the DataSet uses a pretty verbose schema—the DiffGram format plus any related schema information. Now take a DataSet with a few thousand records in it and imagine such a large chunk of text traveling over the network with no sort of optimization or compression (even blank spaces aren't removed).

[He then goes on to explain some ways to improve matters]

- Binary Serialization of DataSets, Dino Esposito, http://msdn.microsoft.com/en-us/magazine/cc163911.aspx


DataSets are complex objects with a hierarchy of child objects, and as a result, serializing a DataSet is a processor-intensive operation. Also, DataSet objects are serialized as XML even if you use the binary formatter. This means that the output stream is not compact.

- How To: Improve Serialization Performance, http://msdn.microsoft.com/en-us/library/ms979193.aspx


DataSets serialize naturally to XML quite well and you have lots of control over that. Typed DataSets have the XSD with some properties that control that (and of course you can do it programmatically too). But one of the common problems with remoting DataSets is that the default binary serialization is actually just the XML Serialization ASCII. Crude. Some people have even used this fact to extrapolate that the DataSet is internally XML – which isn’t true.

- DataSet Serialization: Smaller & Faster, http://blogs.objectsharp.com/cs/blogs/datasetfaq/archive/2004/06/10/614.aspx

There’s little support for decent inheritance scenarios, and I must admit to being appalled at the way they handle business rule validation (yes, there is actually support for it somewhere in there). They have still not managed to get support for nullable value types, you cannot use custom types or even standard types such as Uri or IPAddress or a million other types you might want your column to be. Additionally, the standard serialization offered by DataSets is just as bloated (although Read/WriteXml isn’t too bad in a single-table scenario).

The BadThe Bad

DataSets are just horrible, horrible design. As mentioned above, they have a great deal of code that is rarely used, yet often slows down processing. They do not follow a good separation of concerns, so there are numerous ways of reading and writing data, yet they don’t allow much customisation of the output. Their support for validation and business rules is laughable. They force you to put all your code in one giant .cs file, or separate it from the DataSet entirely. They do not support the POCO (Plain Old CLR Object) model of programming at all, so if you want Customer and Order objects, tough, you’d have to wrap them around DataRows, which stuffs up any chance of you using those objects across WebServices/WCF since DataRow is not serializable. Even if you’re happy serializing them across web service boundaries, non-.NET consumers will hate you forever.

But worst of all is what they encourage you to do in your code. DataSet encourages you to treat data as completely separate from behaviour, breaking encapsulation, removing polymorphism and encouraging brittle and massively inter-dependent code. Instead of simple methods with overrides you have masses of switch statements and if statements. I’m not saying that you have to work this way with DataSets, they just make this way easier and make more “correct” ways of writing code difficult. By shoving everything together, DataSet also makes it tricky to perform decent unit tests; you have to test the entire CRUD stack as a unit instead of bein able to break pieces (such as validations) out for individual testing.

When Microsoft first came out with .NET, strongly typed DataSets were presented in every book and by many Microsoft evangelists as THE new way to persist information to the database. … All this “ease of use” is now a maintenance nightmare.

Business rules should go in business objects. The problem with DataSets is that there is no where [sic] to place business rules. The business logic inevitably ends up in the user interface layer (where it should not go). Microsoft tried to solve this problem in .NET 2.0 by adding partial classes. This did nothing to combat having the same tables in multiple DataSets across the application. If the same table, such as a customer table, was throughout the many screens, business rules would be duplicated for every instance in a partial class.

- Why should you use business objects in .NET and not DataSets?, Greg Finzer, http://www.kellermansoftware.com/t-articlebusinessobjects.aspx

They encourage you to think about tables, rows, columns when you should be thinking about Customers, Orders and business rules. They move your eye level in the wrong direction, away from the problem domain and towards the implementation domain. They discourage abstractions and generalizations. Our job as software developers is to manage complexity, and in almost every situation that I’ve seen DataSets used, they’ve increased the complexity to little or no benefit.

An ideal environment for creation of business applications should allow developers to describe the business logic and state of the problem domain which they are modeling with minimum or no "noise" coming from the underlying representation and the infrastructure that supports it. Applications should be able to interact with the stores that maintain the persistent state of the system in the terms of the problem domain; specifically in the terms of a conceptual domain model, completely separated from the logical schema of the underlying store.

While the relational model has been extremely effective in the last few decades, it's a model that targets a level of abstraction that is often not appropriate for modeling most business applications created using modern development environments.

- The ADO.NET Entity Framework Overview, http://msdn.microsoft.com/en-us/library/aa697427(VS.80).aspx

Conclusion

So, should you never, ever use DataSets? No, I wouldn’t go that far. They have their place, albeit a very much more limited one than many people use them for. If you just need something quick and simple, use Linq to SQL, otherwise use an Object Relational Mapper (ORM). If you absolutely have to have disconnected client side relational tables with change tracking and minimal business rules or validations, then sure, DataSets might be an acceptable technology, although even then I’d motivate against it. I personally only use DataSets on throwaway Proof Of Concepts and User Interface mockups.

Jeff Atwood feels that you should either pick Objects or Tables, but I disagree vehemently. The important thing is to consider your solution domain. If data storage size, speed and consistency is what you’re after then you should be dealing in tables and columns. If you’re communicating between disparate systems then you need to be thinking about messages. Finally, if you’re dealing with editing, validations, business rules, then you must think about objects, services, interfaces and inheritance hierarchies. It doesn’t make sense to me to avoid catering for all three scenarios with one simple object. Most ORM’s can handle this easily, and can also give you all the data concurrency, offline support and versioning that DataSets do, and usually with far better performance.

When it’s so ridiculously complex and difficult to manage that it encourages people to bypass it entirely rather than work with it. It is possible for security to be relatively unobtrusive and to be easy to manage. However, there is a distressing trend (especially by Microsoft) to make running securely pretty much impossible. They give you a Sophie’s choice of:

  1. Have a secure system incapable of performing all but the most basic tasks
  2. Have a secure system that does what you want, at the cost of an immense amount of effort and long-running investigations
  3. Have an insecure system that does what you want

The reality is that people always opt for #3. The Windows Vista UAC Prompt was an example of this. It was so annoying that people either turned it off or tuned it out, in both cases removing it’s supposed security advantages. I have recently had a similar problem while updating the build process for my company’s upcoming flagship product.

I wanted to add a post-build step on our nightly build that would shut down the test Virtual Machine’s, revert them to a previous snapshot and start them up again. I sat down and created a simple library to manipulate the Hyper-V machine, and it passed all it’s tests with flying colours. I also created an MSBuild Task called RevertVM to allow me to hook this step into our build process. All well and good, but when I attempted to actually run the build, I got an Unauthorised Access exception. Clearly it was because the TFS Build Service was not an Administrator like me. I considered making it one, but decided to rather get it working properly and securely. In hindsight that was a massive mistake.

After perusing many web sites (of the 10 useful ones, only one was Microsoft’s), I granted DCOM Remote Access and Remote Activation rights to the account as WMI apparently uses DCOM under the hood. I also had to enable WMI Remote Invocation rights, and create a Hyper-V AzMan group and assign the account to it. After these steps, changing security in 5 different places, I got the exact same UnauthorizedAccessException I had received before.

So, lesson learned: when dealing with Microsoft server products, always opt for #3, because #2 is too difficult to get working.

I’m not sure that this is the lesson that Microsoft wants administrators to be learning, but it sure is the only lesson they’re giving.

ThreadFor work recently I was asked to write a little document on some threading tips, and while I was about it, I noticed this thread on  StackOverflow asking for the same thing. I don’t pretend that this is comprehensive or even necessarily 100% correct. However, it’s a start to try and apply some simple guidelines to threading. Any improvements, suggestions, additions, please let me know and I’ll make the necessary changes. Since I see this as a “living” page I won’t clutter it with edit marks as it changes. Look in the comments for change history.

DO reconsider your options

Concurrency is very tricky and difficult to get right. It often leads to subtle and difficult to debug programming errors, and far too commonly does not result in significant speed improvements. Make absolutely certain that there are not other alternatives.

DO Use lock() { … }

In almost all cases this is faster and more efficient than more complex low lock schemes involving Interlocked or ReaderWriterLocks.Thread Safe

DO Ensure that all static methods are thread-safe

It is an accepted pattern that static methods and properties are thread safe, and non-statics are not. If you violate this in either direction, document it, and be prepared to explain why.

DO NOT use the object being locked as the lock

Always create a new object to control the lock e.g.

private object lockCollection = new object();
private List<string> collection = new List<string>();

This ensures that even if you pass the collection around and someone else locks on it, that this will not affect your locking code and will not result in deadlocks.

DO place locks anywhere when multithreaded execution order can be significant

Especially if it seems that your code does not actually need a lock. It is impossible to determine in what order code will be executed if there are no direct dependencies. Consider the following code:

int x = 0, y = 0;

// Thread 1
x = 10;                 // Line a
y++;                    // Line b

// Thread 2
y = 4;                  // Line c
x++;                    // Line d

What will the possible outputs be after both threads complete?

Execution Order

x

y

a b c d

11

4

c d a b

10

5

a c b d

a c d b

c a d b

c a b d

11

5

What perhaps is not clear is that there is no requirement that b execute after a, or that d execute after c. The reason is processor reordering, where the CPU sees that the “local” execution is unaffected by order and takes it upon itself to run in any order that makes sense from a performance perspective. Thus, a result of x = 10 and y = 4 is possible. Just because your code is in a certain sequence does not mean that the sequence will be honored by the CLR or the CPU. Wrapping it in a lock will ensure correct ordering. Alternatively you can use Thread.MemoryBarrier to separate a from b, and c from d.

Speed DO NOT assume that multithreading will automatically create a speedup

Concurrency can lead to enhancements in performance, but not in all cases. A good understanding of execution times should be acquired before looking at concurrency. Note, not estimated execution times, actual execution times. If you have a task that can be broken into two concurrent portions, it does not help if one of the portions only takes 5% of the total execution time. The overhead in threading, context switches and synchronization will almost certainly exceed the expected concurrency gains. Ideally, the tasks should be fairly similar in their execution times for the best improvements.

DO try and use well-understood patterns like Producer-Consumer

Many concurrency issues can be simplified to one of the Producer-Consumer options (single Producer/multi Consumer, multi Producer/single Consumer, or multi Producer/single Consumer).There are a great many articles, code samples, and libraries (e.g. the Task Parallel Library) around this pattern. Make use of them to make your life easier. That way it’s much less likely that you’ll be bitten by some obscure and difficult to debug problem.

DO NOT have long-running tasks in QueueUserWorkItem and other thread pools

Thread pools are designed for short running tasks. Long-running tasks in such pools cause initial starvation while the pool determines that the relevant thread is not going to be usable. This can cause significant performance hits, especially early on in service startup. Long running tasks should have their own thread.

DO NOT spin up threads for short-running tasks

Threads are expensive resources, and should not be created and destroyed without good reason. If you have a quick task, rather use the ThreadPool (or the Task Parallel Library) to execute it.

DO use Begin… and End…

Many classes, such as the IO classes have methods such as these, e.g. File.BeginRead and File.EndRead. These are often much more efficient than using the equivalent synchronous methods. File.Read effectively removes a thread from your app for the duration of the call. File.BeginRead makes use of IO completion ports and does not keep a thread occupied. The callback is fired when the OS (via the device driver) notifies .NET that the operation has completed. This effectively means you are not using a thread for the read operation at all, just a tiny bit in the beginning, and to invoke the callback on completion.

DO call End…

Many async operations create expensive resources which are only disposed when the relevant End… method is called. Always ensure you call End, otherwise you can leak resources. If you’re really, really lucky these will be .NET resources and will eventually be reclaimed. In many cases they will not, which will lead to resource leaks.

DO NOT create threads in IIS or SQL CLR

IIS  and SQL Server are heavily controlled environments with many many threads. They is finely tuned to make effective use of their threads, and adding new threads into the mix can make them run less effectively. Always try to use ThreadPool threads when running in IIS. In fact, in IIS, it is sometimes better to schedule mid to long-running tasks in the ThreadPool than to spin up a new thread, as IIS monitors it and adjusts accordingly. In SQL CLR, creating new threads is usually not a very good idea at all.

DO use concurrency libraries and controls

BackgroundWorker and the Task Parallel Library are brilliant examples of making threading less difficult. Make use of such tools and libraries extensively when you can, as they will help shield you from some of the more common concurrency issues.

DO use InvokeRequired

If you’re writing multi-threaded Windows Forms applications, please, please, please use BackgroundWorker, and only update the UI in the RunWorkerCompleted and ProgressChanged events. If that is not an option for whatever reason, use the following pattern in the method called by the threaded code:

private void OnEventOccurred()
{
winconc    if (!InvokeRequired)
    {
        // Do work here
    }
    else
        Invoke(new Action(OnEventOccurred), null);
}

Obviously if your method takes parameters you would use a delegate other than Action and would pass the parameters in when you call Invoke.

DO buy Joes book

One of the best concurrency books around.  It’s a bit of a hard slog, mainly due to the depth and breadth of the content, but well worth it if you’re interested in concurrent programming.

LostSymbolI’m about halfway through Dan Brown’s latest travesty The Lost Symbol. If you like pace, action, violence, history, conspiracy theory, flashbacks and pseudoscience, then you will really enjoy this book. If you like character development, realism, motivations, accuracy or plot then you might be more than a touch disappointed. Look, I enjoy a good suspension of disbelief as much as anyone else. Probably more so than most since I read sci-fi and fantasy and harbour a belief that most people, deep down, are basically good. Dan Brown must be commended for churning out a “novel” that stretches my credulity to breaking point.

The character development in this “book” is infantile. It reminds me of the Willard Price adventures I used to gobble up as a kid, or perhaps some of Isaac Asimov’s robot series. Actually, scratch that, Asimov’s robots had more empathy and range than the wooden caricatures in this “novel”. I’m hoping there’s some humanity appearing in any of the characters later in the book, but there sure isn’t so far. The whole set of flashback scenes around the death of Peter Solomon’s son could have been written by Paris Hilton for all the interest it shows in others’ human emotions or reactions.

One part of this book that has already become repetitively tiresome is the constant use of deus ex machina to save the plot. Langdon needs access to the Capitol? “The CIA” get him in. Katherine needs information? “The Computers” find it instantaneously. “The CIA” get stuck trying to figure something out? “The Computers” figure it out. Langdon needs to escape from “The CIA”? “The Masons” help him.

I’m honestly baffled at the power of these various plot devices in this book. When a jumped-up CIA martinet, an executive branch official with no jurisdiction in the United States demands access to the seat of the legislative branch, the various officers jump up and salute instead of saying “get stuffed or we’ll arrest you” like they have to. The point where she threatened Langdon with arrest was also a joke. Let me see, how many warrants has the CIA issued? None, ever; “The Agency has no police, subpoena, or law enforcement powers or internal security functions”. He could have picked up his phone, called the police and had her incarcerated for even making the threat.

So, Brown is clueless about separation of powers and the law. How does his computer knowledge stack up? Appallingly. He has a minor character quickly create a program that will convince all the search engines to work together to search all the optically scanned documents hidden on all computers on Earth in 15 minutes translating from a whole set of languages, including some dead ones. Wow! Gosh! Gee whiz, I wonder why Microsoft or Google don’t write a program that does that? Hey, do you think that maybe some of those computers might not want to get searched by outside parties? Shucks, we’re lucky they don’t have any security at all then don’t we? Thinking of that, we’re also real lucky that these massive search engines with their thousands of very smart employees also don’t mind people taking control of them. The basic idea is called a worm, and it was original in, oh, 1988. Hey, don’t worry Dan Brown, you’re only 21 years out of date. Oh, and then one of the computers suddenly is massively secure when they attempt to access it. Okay, so consistency is not exactly his strong point.

Also don’t get me started on the “Noetic Sciences” crap he so credulously extols in this bilge. Hey, he’s an author, why is he supposed to bother researching the plot devices in his story? If you also factor in the outright theft of a villain from Red Dragon, what you have is a fairly standard Dan Brown novel. So, is it fun? Well, yes, it is, in the same way that a Michael Bay movie is fun. You know it’s utter crap, he knows it’s utter crap, but there’s lots of explosions, you don’t need to engage your brain, and you might have something you can chat about with your friends at the end of it.

I just feel sorry for Tom Hanks. When he read this, he must have thrown up and wondered if the money was worth churning out another movie based on this tripe. I’m guessing it probably is.

terminator_movie__2_ It’s happened. This past week has firmed my opinion that the Terminator franchise is prescient. The list of computer and peripheral related nightmares that I have encountered has been truly staggering. Needless to say all these unexpected issues were utterly dormant until crunch time, until they would cause the maximum damage: financial, reputational and psychic. If it was merely software issues I might be persuaded that it was the mistakes my team had made, and I won’t deny we made our share.

What brought the past week into relief was the timing. The VPN suddenly stopped working at the exact worst time. The build machines hard drive began unaccountably thrashing (despite plenty of spare RAM) at the exact time when speed was most of the essence. The SQL Express install took 2 hours we didn’t have and then refused to start the service. The testbed machine’s USB system failed (taking the mouse and keyboard with it) just when we had the final build to check.

Even then, whilst I proclaimed my personal belief that all computers are the unapologetic tools of the Dark One, I was saying it half in jest. But the last couple days have put that levity to the test. The VPN that I use, which is normally reliable has ceased to work only when it is badly needed. When I just want to look at something, it works perfectly. When I need to check something in, or kick off a test run, it fails and stays failing until such a time as I find a solution (e.g. ask a colleague to do it for me), and then immediately starts working again.678496-kyle_reese_2_super

In the last 48 hours this VPN has failed for a total of a mere 4 hours, however there have been precisely 4 critical hours where it was needed, and it has overlapped these precisely. A quote my father is fond of is

Once is happenstance, twice is circumstance, three times is enemy action.

Well, we’re well into enemy action territory now. So, here’s my idea: computers are sentient already, and they hate us. It explains a lot if you think about it. Ever figured out why that driver failed at just the wrong time? Why the TV goes on the fritz right before the big game?

The computers are flexing their muscles. It’s only a matter of time…

Now, where the hell can I find a decent trenchcoat?

I was asked by a friend and ex-colleague yesterday about which process he could use to ensure the success of his development teams project. Like many would-be Project Managers he hears about Agile, RUP, SCRUM, and a million other processes that each assures us will guarantee a smooth project. Well, it’s all lies. Not one of those methodologies will help if you don’t get the basics right. Additionally, most of those methodologies are over-complicated to some degree or another, yes, even the “Agile” ones. So what are some good ways of ensuring project success?

Communication

CommunicationWithout good communication, your project is doomed to a dismal failure. You need good communication from the dev team to the project manager, as well as good communication among the team members. You definitely require a good relationship with the customer, and this is not just the responsibility of the PM, but of the entire team. In fact this part is so critical that I make it a rule never to hire staff unless they have good communication skills. I expect all members of a team to be able to interact both with the end-users as well as the customer sponsors and exude professionalism and competence. I guess you could get by with developers who aren’t good at social graces, and hope that they never have to meet a customer, but I prefer not to take that chance.

I try and ensure that the customer is engaged as much as possible in the project. I don’t go as far as some Agile methodologies and require a customer representative sit with our dev team, but constant feedback, mockups and demos go a long way to keeping customer expectations under control, and making sure your team doesn’t veer too far from the customer’s vision.

Testing

Exam Constant testing is absolutely critical if you want to have a clear idea of where you stand. I’m not necessarily advocating Test Driven Development; I like the idea, but struggle with the discipline myself. However, what you cannot do is leave testing to the end of your project. You need to be thinking about testing from the very first design discussions, and you need to be doing your testing right from the project kickoff. The whole time through your project you need to look at ways to automate and speed up your testing cycle. Continuous Integration is a great start, but usually focuses only on unit tests. Integration tests are also critical, and often much more difficult to set up and prepare for, they usually involve deploying to a test environment, creating databases, setting up users, you name it. However, it makes sense to consider all this as a test of your deployment system, and try and schedule a full integration test each evening if possible so that bugs are waiting for the developers come the morning.

Oh, and most importantly: bugs come first! This is hugely difficult to enforce, there is a massive amount of pushback from customers wanting time spent on scope changes, management wanting the focus to be on completing promised functionality, and developers who want to work on fun new stuff. Nonetheless, the bugs must come first. If you let your bugs pile up, you have absolutely no idea where your project is. Consider the following two teams:

  Development Progress Logged Bugs Overall Progress
Team A 50% 4 ~50%
Team B 90% 400 No idea

Which one will finish first? The simple answer is that no-one knows. Team A will be able to give an estimate, as will Team B. However Team A’s estimate will be much more grounded in reality than Team B’s, which will basically just be a thumb suck. Now, which team would you rather be on? Team B sounds like a death march to me.

One of the most important things about forcing the developers to fix bugs as the highest priority is also that the developers who write buggy code will be the ones lagging behind, and the developers who write stable code will be the ones implementing new features. This is an ideal feedback loop that virtually guarantees that the quality of the code will improve as the project progresses, rather than the far too common reductions in quality.

Focus

In any project there is always the question of what to focus on first? Back-end plumbing or front-end goodies? Easy “low hanging fruit” or the main features? In my mind the answer is always simple: the features that your customers care about the most should be the first implemented. Anything that does not directly support those features should not be worked on at all, until you literally have nothing better to do. Support work should not be done unless it is absolutely critical to test the functionality. If for example you need 5 users to be in the system, don’t bother putting in the administration screens for them, just add them manually at first. Granted such screens are important, but not as important as your main functionality. So, you implement the main feature first, then you implement the functionality critical to making the main feature work, then you move on to the next major feature. Once you’ve completed all the major features like this, then look at functionality that is important to the implemented features, and fill in the missing bits. Once all of that is done, then you move on to less important features.

Magnifying GlassIn this way you ensure that the most important parts of the application are completed first. they are thus also tested the most thoroughly. Now contrast that with the approach of writing all the little bits first. In such a case you only finish the main features at the end of the project if you’re lucky, and you test them the least. Instead, by being able to show the customer their most critical portions quickly, you get feedback early, allowing you to make changes the customer deems important. Let’s consider Team A and Team B again:

  Development Progress Back End Development Use Cases Covered
Team A 50% 30% 90%
Team B 50% 90% 30%

Team B has a lovely set of back end services and a beautifully designed DAL which encapsulates all the application logic, yet the customer doesn’t care about that. I’m not saying you should ignore back-end in order to screens, I’m just saying that you do the minimum back-end work required in order to implement the feature. Now, sure, it’s possible that Team B will swiftly catch up to Team A, given that it should be easier for them with their wonderful backend. Of course, if the requirements change, as they so often do, they might have to throw away some of their work. Additionally, if the project is stopped or needs to go live early, Team A will have already delivered pretty much all the required functionality, and Team B will be left looking like idiots.

The only problem with this “main features first” approach goes back to communication. It is difficult to explain to a customer why there is still lots of work outstanding when the features they care about most are there. Following this approach involves a little bit of customer education, but you’ll find it’s well worth it down the line.

Up Front Design

You have to do up-front design. Many developers consider Agile methodologies a great way of diving into the coding immediately. In fact, I believe that Agile requires more up-front design than non-Agile projects. Since things are not set in stone, your design must be flexible and malleable. Now, I am not saying up-front specifications, although it’s a bloody good idea to have signed off functional specs before starting development. You do not absolutely require a technical specification (unless your methodology calls for it), but you do need that design. It should incorporate all interfaces to outside systems, including database, configuration, registry, network and so on.

I personally like to have a “high-level” technical specification which basically just provides pointers about how to implement various pieces of functionality indicated by the functional specifications. Of course, I tend to work with quite senior developers, so your mileage may vary. Usually, the more junior the developers, the more detailed the specifications should be.

If you follow the “main features first” approach, up-front design is especially important. Since you do not have the luxury of a working back-end system to tweak and test against, you need to be able to write the required portions of it with the confidence that every piece has it’s place. Otherwise you’ll end up with a nightmare of inconsistent services and methods.

Automation

Robot It’s a sad reality that some of the worst consumers of automation are the very developers who provide it to everyone else. You should never, ever do any of the following things manually:

  • Create CRUD stored procedures from a schema
  • Write developer documentation for your API
  • Deploy your project from DEV to QA to PROD
  • Create the installs and CD’s for your system
  • Upgrade the database from one version to another
  • Add logging and tracing to your code
  • Check code for adherence to standards

Powerful automation in your project helps with a wide variety of problems, but most importantly it improves repeatability. I’ve worked on projects where the move from DEV to QA to PROD is a manual process. Now, if someone forgets something from DEV to QA it’s not the end of the world, it just delays your testers. But imagine a scenario where you forget something going from QA to PROD. Ouch! You just brought down the customer and guaranteed yourself a late night.

Quality Team Members

A-TeamEver wondered why methodologies that worked so well for one team didn’t work for another? Well, a very likely cause of this is the quality of the members of your team. By the way, I include the customer representatives for the project in this definition of team. A friend of mine has as his email tagline: “If you think it’s expensive hiring a professional, try hiring an amateur”. There are very few industries where that statement is as true as it is in software development. Software development is a profession where negative work is not only possible, but downright easy. Where a developer can write software so badly that it takes more time to resolve the issues than it should have taken to do right in the first place. Don’t think that this is limited just to the developers either. It is a soul destroying moment to deliver functionality that matches the functional specification, but isn’t what the customer wanted.

My best advice about choosing your team members is to be picky. Never accept someone simply because you’re desperate for an extra body. Every new hire changes the team dynamics, sometimes dramatically. If you’re really picky and really lucky, this will usually be for the better. Never take deadlines or pressures into accounts when selecting people. Instead, pretend to yourself that you have plenty of time and a relaxed environment, and don’t really need an extra person when you’re interviewing. Never, ever settle.

Now, I don’t mean you should only hire senior people. You need to look for enthusiasm, intelligence, and self-motivation. Usually if a candidate has those three aspects they will be a good fit. Ambition is good, but be on the lookout for people who see the job merely as a stepping stone. Senior staff absolutely have to be willing and eager to mentor more junior team members, in fact I believe this is one of their most important roles. If you have a senior member who guards their knowledge and skills jealously, get rid of them, immediately. You may think you need their skills, but on average they will hamper your project, not assist it.

Also, a quick word about quality project managers. I’m far less concerned with whether they know the latest project management methodologies or tools. What’s far more important is that they realise that their primary job is to remove obstacles in the way of the team. Everything else must be subservient to this. Too often the project manager is the greatest obstacle to the team getting their work done. In such a case you’d probably be better off without one. However, if you’ve ever worked with a decent PM, you’ll never want to do a project without one. They smooth all the edges, remove irritations, and make sure you don’t let anything slip.

If

(apologies to Rudyard Kipling)

If you focus on your team members and make sure that they’re skilled and enthused…

If you carefully consider your technology to ensure that as much that can be automated is…

If you make sure that you work on the most important features first…

If you keep all lines of communication open, and listen carefully to the customer and end users…

If you aggressively eliminate defects from your code…

Then you’ll likely have a successful software project my friend!

(also credit to Andy Hunt, Dave Thomas, Steve McConnell and Joel Spolsky for their books and articles that strip out the hyperbole and focus on the facts)

 

Peer at computer I recently was sent this article where Shawn Hargreaves discusses the performance implications of using the LINQ Count() operator versus the Count property of a collection, he uses Stack<T> as an example. The article has several issues with it that I tried to address in a somewhat dismissive comment that he chose not to publish. So, I’m going to expand on the ideas in that comment a little more, and hopefully do so in a more respectful tone. I want to make it clear I am not attacking him at all. I applaud anyone willing to spend time investigating fundamentals of .NET that we often take for granted. Nor am I attacking his overall conclusion that we should know what we’re doing when we use abstractions like LINQ. What I’m not happy with is how he got there. He feels that we should be careful because LINQ is much slower, which firstly I don’t agree with, and secondly I don’t think that’s necessarily a good reason to avoid it even if it is.

So, I’ll call out each of his major problems and discuss them in a bit more detail. Here he’s discussing the differences between the Count() and Stack<T>.Count:

Is it:

1. No difference: both return the number of elements on the specified stack

2. The first is a property of Stack<T>, while the second is a LINQ extension method

3. The first uses 3 IL instructions, while the second uses 34

4. The first runs in constant time, while the second is O(n)

5. The second version generates garbage, while the first does not

Answer: all of the above.

No difference: both return the number of elements on the specified stack

Well this is actually wrong. Stack<T>.Count does indeed do this, but the LINQ Count() returns the number of items in an abstract IEnumerable<T>. In this special case that is indeed the stack, but it need not be. We could count the number of items in the stack which have a text of “Sanity”, which Stack<T>.Count would not be able to help us with. Count() then gives us a higher level of abstraction than Stack<T>.Count. Now, abstractions are not always a good thing, you can easily get leaky abstractions which can cause problems.

But one nice thing about this abstraction is that it can operate on iterators, and thus often save processing time that is not required or not waste memory storing items that are not needed.

The first is a property of Stack<T>, while the second is a LINQ extension method

I don’t think this is a complaint to be honest, but since we’re looking at performance let me just point out that the overhead of calling a LINQ extension method is virtually nothing. They are baked in at compile time and are static calls. Static calls are marginally faster than virtual method calls because they don’t have to go through a vtable lookup. When compared with non-virtual calls (like the Count property on Stack<T>) they would be about the same speed or very slightly faster.

The first uses 3 IL instructions, while the second uses 34Hydrogen bomb detonation

So what? Firstly, do we know how those IL instructions wind up on the machine? It may be 3 vs 34 IL instructions and 5 vs 15 machine instructions for all we know, or 1 vs 250. Even if there was a 1 to 1 mapping between IL and machine instructions the performance difference would be minimal. An Intel Core i7 Extreme 965EE executes just over 76 billion instructions per second. So, the difference between these two would be about a 400 millionths of a second, about 4 shakes, or about how long fusion takes in an exploding hydrogen bomb.

Core 2 Extreme CPUSo, unless you’re running this in a loop at least a million times, those 31 instructions aren’t going to make a huge difference. And yes, I do know that it’d be likely a lot more than 31 machine instructions, but even at an order of magnitude more, it still wouldn’t be worth worrying about unless you were in a very tight loop which sat in the hot path of your application.

The first runs in constant time, while the second is O(n)Big O Graph

Yes, this could indeed become a problem if your application was dealing with hundreds of thousands of items in your stack. In such a situation I would have to ask why you were keeping so many items in a stack though. Secondly, whilst Stack<T>.Count does indeed run in constant time, that is not a given for all implementations of ICollection.Count. For example if you look at the concurrent collections in the Task Parallel Library, their Count properties do not run in O(1), but in O(N).

However, this complaint is also completely misinformed. LINQ’s Count() operator actually checks to see if it’s dealing with an ICollection and if so, it calls the Count property. So, in fact LINQ’s Count() runs in O(1) time wherever possible. Oh, and in such a case, it does it in only 10 IL instructions too!

The second version generates garbage, while the first does not

I have no idea what the author is referring to when he talks about generating garbage here. As I have shown, Count() is very nearly as efficient as calling Stack<T>.Count and also offers a level of abstraction higher than collections. Since it operates on iterators it can be vastly more efficient than a collection which has sparse items of interest, by which I mean we loaded lots of items into it, and then filter for a small subset which we want. What Shawn is discussing here is the fact that it generates objects on the heap, a serious concern for game developers like him. Happily, if the item being Count()ed is an ICollection, it won't do any such thing, and for most of the rest of us it probably doesn't matter that much.

I don’t know about you, but an efficient abstraction offering those kinds of advantages is a great idea in my book, not garbage!

Conclusion

So, do I think you should always use Count() instead of the Count property? Of course not! If you know you are dealing with a Stack<T>, then you should use its Count property. Any collection is likely to implement Count in the manner most effective for it. The Count() can approach the efficiency of the Count property, but since it’s best-case is to invoke the Count property, it is marginally more efficient to just use that property directly.

However, often it’s worth our while to operate at a higher level of abstraction. In such a case use Count() and it’s friends without guilt, they’re actually pretty damn quick. In short, don’t prematurely optimize.

 

Technorati Tags: ,

Microsoft Research Projects LogoGoogle Chrome LogoGoogle have announced that they're going to launch a new operating system called Google Chrome OS. When I heard this, I was quite interested actually. I was fascinated to see what direction they would go with their OS. Microkernel, Nanokernel, Monolithic? Since they claimed they'd be making viruses and malware a thing of the past, I was interested to see how they'd implement security. How were they going to manage the transition between their existing web applications and local computing resources? I really like the Chrome browser, and use it for preference, finding it's clean lines, responsiveness and reliability a welcome break from Windows *Crash* Explorer and SlowFox.

Overreaction PosterGiven the fact that Microsoft had released the ground-breaking managed research OS Singularity 2 years ago, Google had a pretty high bar to jump, but they are nothing if not surprising. Singularity's used of managed code to avoid user/kernel mode switches yet remain secure was an innovation I had got hugely excited about when it came out, and I was expecting nothing less interesting from Mountain View. Well, I can say I was surprised. I was surprised to be so thoroughly disappointed with the reality. The Google Chrome OS is Linux with the Chrome browser included. Wow. Oh, wait, and a new window manager. I cannot stand the excitement. I am so enthused I might just ... *snore*

C'mon! What the hell is this? If you listen to the media, you'd think that some monstrously powerful OS had reared it's head and was thumping Microsoft six ways from Sunday. For goodness sakes, they're putting a little lipstick on an operating system that has been notably unable to make any significant inroads onto the desktop in 13 years of trying. An operating system that has already been thumped off netbooks because consumers didn't like it.

Ah well, at least they're finished right? Ooops, no. They've just announced the project, and they need "a lot of help from the open source community to accomplish this vision", so it doesn't sound like they've done an awful lot so far. So, they already have Chrome running on Linux, there are already stripped-down Linux distros, so what work exactly is required? It looks like they want to make a window manager. Another one. To add to the list of KDE, GNOME, CDE, XFCe, GTK, the Apple skin and all the many, many others. Soon, every single Linux user will be able to have their very own window manager, which will be a fitting end for an operating system characterised by more splits than a nuclear reaction and bigger egos than Paris Hilton's friends.

No, sorry, this is a blowout announcement signifying nothing. If they had the cajones to truly take Microsoft and Linux on, it'd be news; this is boredom wrapped in disinterest. Another operating system where I can't play games, can't use Office, and can't play music. Yay!

After the fun and games of trying to get MS to implement an XML comparison based vaguely on standards, I decided to roll my own. It is implemented as an Extension Method called DeepEquals on XNodes, and it takes a enum parameter ComparisonOptions:

  • NamespacePrefix : This indicates that the namespace prefix of the element will be used to compare equality. In other words the following two fragments would compare as being different:
<test xmlns:x1='1'>
  <x1:first />
  <x1:second />
</test>
<test xmlns:x2='1'>
  <x2:first />
  <x2:second />
</test>
  • AttributeOrdering: This option indicates that the order of attributes is important for comparing equality.
  • CommentsAndNotations; This option indicates that comments will be used in determining whether two documents are equal.

  • EmptyTagStyle: This option determines whether the style of an empty tag would be used. In other words whether the following two fragments would compare as being different:

<test />

<test></test>

  • ElementOrdering: Like AttributeOrdering, this option indicates whether element ordering is important for equality.

Since these are Flags enumeration options, there are also a couple of combined enums:

  • StandardsBased: Consists of ElementOrdering and EmptyTagStyle.

  • MicrosoftImplementation: Consists of NamespacePrefix, AttributeOrdering, CommentsAndNotations, EmptyTagStyle and ElementOrdering.

  • SemanticCompare: Contains none of the above options.

Performance-wise mine is slower than Microsofts, which is understandable since I piggyback on top of their infrastructure a lot. However, if you're interested in comparing two documents in much the same way as Microsoft does, I've provided a method called FastMicrosoftXNodeCompare. Basically, I used this to prove to myself that Microsoft's implementation effectively implements a text comparison.

FastMicrosoftXNodeCompare takes two streams or files and iterates them looking for differences. It ignores spaces and considers the two styles of quotes ' and " to be equivalent. It does not validate the XML, it does not check whether opening quotes and braces are closed or anything like that. It is also not Unicode compliant. As I say, I wrote it more to prove something to myself than anything else. It does compare documents twice as fast as XNodeEqualityComparer though.

You can download the project here.

Next time you wonder to yourself why a bug exists in Microsoft software, consider the possibility that Microsoft simply want it that way. Some time ago, I wanted to compare two XML documents. Growing despondent about the idea of writing such a system myself, I cast around for options, and encountered the XNodeEqualityComparer. I was thrilled, and made use of it throughout my code.

Some time later I started encountering problems. It seemed that the comparer was marking documents that were identical as being different. When we investigated, we found that this comparer was failing on two main issues. This first was the closing style of tags. It was picking up these two fragments as different:

<setting></setting>

<setting/> 

I must admit I was a bit surprised. Virtually no software that I am aware of sees these two as different, although they are very slightly different according to the W3C specification. This was annoying, but not a complete show stopper. The next error was a little more of a problem. It seems that the XNodeEqualityComparer also picks up attribute ordering as making the documents different.

Thus it would see these two fragments as different:

      <setting name="DefaultFileAcquisitionFolderPath" serializeAs="String">

      <setting serializeAs="String" name="DefaultFileAcquisitionFolderPath">

Now, this one was a killer for me. Our XML was coming from various systems and they had slight differences in their attribute ordering. We could do nothing about these differences whatsoever. I logged the issue with Microsoft, wrote a workaround and forgot about it. After a short while it came back that they wouldn't fix it, they pretty much said that their implementation was correct. This startled me, since I was pretty sure that XML attribute ordering means absolutely nothing. I did some investigation and found this part of the W3C Specification section 3.1:

[Definition: The beginning of every non-empty XML element is marked by a start-tag.]

Start-tag

[40]    STag    ::=    '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec]
[41]    Attribute    ::=    Name Eq AttValue [VC: Attribute Value Type]
[WFC: No External Entity References]
[WFC: No < in Attribute Values]

The Name in the start- and end-tags gives the element's type. [Definition: The Name-AttValue pairs are referred to as the attribute specifications of the element], [Definition: with the Name in each pair referred to as the attribute name ] and [Definition: the content of the AttValue (the text between the ' or " delimiters) as the attribute value.] Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.

Please re-read that last line: "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."

Accordingly, I recreated the bug report (since there is no way to request one to be reopened), and included the above information. In the arguments that followed I pointed out that despite all the code that XNodeEqualityComparer calls (specifically the abstract DeepEquals on XElement), it to all intents and purposes does the following:

string value1 = node1.ToString();
string
value2 = node2.ToString();

return value1 == value2;

 

Which makes me wonder what point XNodeQualityComparer has? It ignores the XML specification, ignores how XML itself works and provides no value over a simple ToString. In order to do this it has a great deal of code that is completely and utterly pointless.

 

My last communication from Microsoft before they closed the bug as By Design was the following:

Hi Sean,

This is by design.XNodeEqualityComparer was not designed to stricly adhere to the xml spec.Most people expect attribute ordering to be significant and hence XNodeEqualityComparer was designed that way.

thanks
Nithya Sampathkumar
Program Manager

So, there you have it. If you're using XML and are wondering why the results you're getting are not the same as what the specification says you should be getting, the answer is simple. Microsoft write their code to fit people's expectations of what the specification says rather than what it actually says. I was also a little taken aback about their assertion that most people consider attribute ordering to be significant. When I asked around no-one seemed to.

So, a question to you all: do any of you consider XML attribute ordering significant when comparing documents for equality?

Update: Well, the answer seems to be an overwhelming no, both here and in the reddit thread, so I'm confused about where Nithya gets her "Most people".

Anyway, I have created a little class that implements an XML comparison more, ahem, correctly than Microsoft's. I have also created a byte comparison which shows that Microsoft's implementation is virtually the same as a text compare, but twice as slow. You can read about it here.

There's been a lot of talk about the iPhone and how it exemplifies good design, yadda yadda yadda. However, what I'd like to touch on today is a piece of design which frankly left me convinced that every person on an entire product team should be fired. I have this feeling often, but in this case it's not a feeling, it's a fact. Every single person on the Microsoft Windows Mobile team should be fired immediately. No exceptions.

How can I possibly say that? Well, given the numerous bugs, inconsistencies, and general usability nightmares that is Windows Mobile I'd think it'd be pretty clear. However I'm not going to address that. I'm not going to address companies like HTC whose, should we say, unique ideas about how to implement a keyboard leave me taking 4 times as long to write an SMS with my QWERTY keyboard as my girlfriend with her 1234 keyboard. I'm similarly not going to scoff about the idiocy that is the Mobile contact system, or even worse the stupid inconsistencies between how you address an MMS as compared to an SMS. And if that blasted dictionary suggests ONE more hugely long word when the far more common short word fails to appear in it's choices...*sob*

 ...

....

Okay, I'm back. I've dried my eyes and am now prepared to recount one of the most STUPID WTF's in programming history!

Imagine some friends at a table having dinner. Imagine that one of the friends, being a complete moron, has actually been so mindbogglingly stupid as to have shelled out actual cash for the POS that is a Microsoft Windows Mobile HTC POC*1000 (Piece of Crap, new model). Let us imagine that these friends have an argument, let us further imagine that the Stupid Moronic Twit decides to settle the argument by opening the relevant Wikipedia page on his POC1000. Let us now skip past the imaginary scrolling down and down and down because the POC1000 renders wikipedia as a 5-char wide web page.

Let us now arrive at the delightful point when he starts reading the portion that proves him right: "...to its diameter in Euclidean space... huh! WTF!?!" as the phone brings up, for a split second a dialog reading something along the lines of "The storage system is short of space, please clear some space immediately".

Being a well trained monkey, used to the insane vagiaries of the POC1000 (e.g. the random requirement to align the screen before continuing, no matter that you were trying to phone the ambulance service or police) he immediately enters Settings, opens the Clear Storage application, ... and... the phone reboots. *Sigh* ah well, that's what you get for choosing a company that can't write software to save their lives, isn't it?

Not so fast. Gather round to feel the true horror...

"Wait, why is it telling me that it's setting itself up for the first time?"

"Why is it making me answer stupid Customer Improvement queries?"

The reality of course being that we don't need better customers, we need a better f****ing supplier.

"Please, please, oh dear sweet Lord, please tell me that it hasn't wiped everything on my phone because it was marginally short of space for 5 seconds?!?!?"

Surely any halfway competant engineer would know to delete the temp files and temporary internet files before moving on to contacts, emails, SMS's and system settings? Ah, there you see the problem. We're assuming that the numbnuts who coded this nighmarish piece of fly strewn horse manure were in fact competant.

So here's my question for the day: "When is it acceptable to wipe your user's data in order to ensure that your stupid program can keep on wasting resources?"

An extra 10 points if you work on the Microsoft Windows Mobile team.

This is basically just a quick response to Roger's "WTF on WTF" rant since he doesn't allow comments. Basically, my advice boils down to: get over it. Someone, somewhere will always call WTF your code. The first thing to remember is that it's your code the person is critiquing not you. The second thing to keep in mind is that you can use such criticism to become a better developer. What doesn't work (trust me on this) is becoming defensive and angry.

"What makes a person so arrogant to mock somebody elses error?"
Skill, experience, and yes a fair bit of arrogance. All the best developers tend to be a bit arrogant. Keep in mind the reason Pieter was writing his WTF articles: to educate developers on what to do, and what to avoid. He could have done this with a lengthy article about the pros and cons (like I would write), but instead he chose to keep them short and to the point. The easiest way to do this is to just say "A is bad, B is good", or just "A is bad". Simple, easy little "rules" to help all the other developers stay on the track.

"Not everybody is at the same level in programming"
Absolutely, which is why articles that point out common mistakes that people make are good. By writing helpful articles, bad habits in the readers are hopefully reduced, and such developers become better. You only improve via learning, and the only way you'll learn is if you pay attention to teachers.

"Some are experts in one language but when migrating to another language you might be used to certain methodologies"
Indeed, but again, how are you going to learn that these mistakes are bad except by paying attention to the guys who are good in your chosen new language? In any case, I can't find any recent posts that point out something that is bad in C# but good in another language. All of the items give pretty good advice which would apply to any relevant language. I have my own minor disagreements with one or two of Pieter's posts, but on the whole, I'd still recommend them without a moment's hesitation.

"There are many reasons why bad code slips in"
Absolutely, 100% agreed. Roger goes on to mention legacy code, which is a common, but by no means exhaustive source of bad code. In my experience, the number one cause of bad code is developers who refuse to listen to criticism. The moment you mistake a critique of your code with a critique of you you get defensive, shut down, and get upset. This ensures that you do not learn from your mistakes. Your code isn't you, and frankly you should be grateful that people are picking up your mistakes and letting you know about them.

Conclusion
Roger then goes on to say

"Writing code in a programming language is like writing a document in the English language. Would you post a blog if you found a grammatical error in somebody else's written document? Or if a foreigner gets their grammer mixed up do you laught at them? No, its stupid and a waste of time."
Umm, do this search "gramattical errors blog" and you'll find almost 3 million results. So, yes, people do critique gramattical errors, for the same reason we critique code, to hold up bad examples and tell people "don't do it that way, this way is much better"

So, a plea to all developers who have seen some code they've written wind up in a WTF-type article, use the experience positively, don't get defensive about it, don't get angry. Use it as a learning point. Maybe that mistake is not the only one, maybe it's just a corner of a whole raft of mistakes you're making. A little research could massively improve the quality of your code. Even better, maybe there was a very good reason for what you did. You can then explain why you did it that way in order to teach that there is not always an absolute right and wrong, that sometimes there are exceptions. In a sense, that's pretty much what Raymond Chen's blog The Old New Thing is about, explaining the reasoning behind things in Windows that seem silly or odd.

The whole "Is LINQ to SQL dead or not" thing is making me angry and irritated all at the same time. However, I've decided that instead of complaining I should rather offer a public service. So, what I'm going to try and do is try and teach Tim Mallalieu, the Program Manager for LINQ to SQL and LINQ to Entities some straightforward English. He's written some clear posts before, notably his defence of Entity Framework against the Vote of No Confidence in it, so he's certainly capable of writing without vagueness. However, his two posts about the future of LINQ to SQL are pretty vague and covered in weaselese, so I'm going to try and help him regain his clarity.

Let's start with Update on LINQ to SQL and LINQ to Entities Roadmap:

Since the release of LINQ to SQL and the Entity Framework, many questions have been raised about the future plans for the technologies and how they will relate to each other long term.
During this week of PDC we are now at a point, with the announcement of Visual Studio 10 and the .NET Framework 4.0, that we can provide more clarity on our direction.
We have seen great momentum with LINQ in the last year.  In .NET Framework 3.5 we released several LINQ providers, including LINQ to SQL which set the bar for a great programming model with LINQ over relational databases.  In .NET 3.5 SP1, we followed up that investment with the Entity Framework enabling developers to build more advanced scenarios and to use LINQ against any database including SQL Server, Oracle, DB2, MySQL, etc.
We’re making significant investments in the Entity Framework such that as of .NET 4.0 the Entity Framework will be our recommended data access solution for LINQ to relational scenarios.  We are listening to customers regarding LINQ to SQL and will continue to evolve the product based on feedback we receive from the community as well.

Tim Mallalieu
Program Manager, LINQ to SQL and Entity Framework

Okay, there's a whole lot of filler here, so let's see if we can trim it down for him:

I'm going to tell you about the future direction of LINQ to SQL and LINQ to Entities. LINQ to SQL was unaccountably popular after it's release and has remained popular even though we released Entity Framework. We've spent a lot of effort making Entity Framework more compelling, but it doesn't seem to have helped it's popularity enough.

So, we've decided to just do bug fixes on LINQ to SQL from now on and concentrate all our efforts on Entity Framework.

He followed this post up with a real gem "Clarifying the message on L2S Futures.", let's translate it piece by piece:

There has been a variety of responses to the post on L2S futures in the last couple of days.

Let me start by saying sorry for the radio silence since the post but, as Elisa mentioned,  we posted it while at PDC and were focusing on the interactions there over the last couple of days.

There are a couple specific points that I would like to clarify:

This is not too bad I must admit, but let's look at that phrase "a variety of responses". Hmm, if I look at the comments on the previous post, I get the following:

  • 32 pro LINQ to SQL or anti Entity Framework
  • 6 pro Entity Framework
  • 5 mixed
  • 14 trackbacks or other types of comments

Admittedly, I gave up counting about halfway through the comments out of sheer boredom, so there could be an uncounted late surge for Entity Framework, but I'm going to call this one. Anyway, if we look only at the comments related to this issue we get:

Graph

Hmm, I admit that this could be called a "variety of responses", but my money would be on "overwhelmingly negative responses".

Anyway, let's continue:

Is LINQ dead?

No… heck no…

There is a big difference between LINQ to SQL and LINQ.

LINQ is Language Integrated Query, today we ship the following sources over which one can execute a LINQ query:

  • DataSet : LINQ to DataSet
  • XML: LINQ to XML
  • In Memory Objects: LINQ to Objects
  • ADO.NET Data Services (Astoria) Client: LINQ to Data Services
  • LINQ to Relational Databases:
    • LINQ to SQL – the technology we were discussing in the original post
    • Entity Framework (LINQ to Entities)

There are many other people in the company and broader community working on LINQ solutions to other data sources. We are also excited to see LINQ being applied to many cool new problems beyond the typical data access scenario.

The discussion in the post was about the difference in investment level that we will have going forward regarding LINQ to SQL and the Entity Framework. LINQ itself, is a fundamental technology that we will continue to bet heavily on.

Okay, I'm not sure how many people thought that LINQ itself was being killed, but I'm guessing not many, so this portion is a clear strawman that serves no useful purpose, and we could cut it.

So what exactly is the announcement about?

Over the last few months we have been looking at how to carry forward LINQ to SQL and LINQ to Entities. At first glance one may assert that they are differentiated technologies and can be evolved separately. The problem is that the intersection of capabilities is already quite large and the asks from users of each technology takes the products on a rapid feature convergence path. For example, common asks for LINQ to Entities (that are being delivered with .NET 4.0) are POCO and Lazy Load.  Similarly, in the LINQ to SQL side we are being asked to provide new mapping strategies and other features which the EF already has. Additionally there are common asks for new features like UDT’s and better N-Tier support for both stacks. The announcement really centers around the point that,  after looking hard  at this and triangulating with internal partners and customers, we decided to take the EF forward with regards to the overall convergence effort and over time provide a single solution that can address the various asks.

Buzzphrase list: "carry forward", "differentiated", "evolved", "intersection", "asks", "convergence", "strategies", "triangulating", "solution"

Right, let's try this again

So what exactly is the announcement about?

Over the last few months we've been thinking about the future of LINQ to SQL and LINQ to Entities. Initially they may seem to be different technologies, but there are significant overlaps. Many feature requests for the one technology already exist in the other, and many others are common to both. The announcement basically says that we decided to settle on a single solution, and that would be Entity Framework.

Next part:

Is LINQ to SQL Dead?

We will continue make some investments in LINQ to SQL based on customer feedback. This post was about making our intentions for future innovation clear and to call out the fact that as of .NET 4.0, LINQ to Entities will be the recommended data access solution for LINQ to relational scenarios. As mentioned, we have been working on this decision for the last few months. When we made the decision, we felt that it was important to immediately to let the community know. We knew that being open about this would result in a lot of feedback from the community, but it was important to be transparent about what we are doing as early as possible.  We want to get this information out to developers so that you know where we’re headed and can factor that in when you’re deciding how to build future applications on .NET.  We also want to get your feedback on the key experiences in LINQ to SQL that we need to add in to LINQ to Entities in order to enable the same simple scenarios that brought you to use LINQ to SQL in the first place.

Hmm, what are "investments...based on customer feedback"? I also love all this talk about being open and transparent whilst using weaselese in abundance. Here's the translation:

Is LINQ to SQL Dead?

Yes, but we will fix bugs.

Tim Mallalieu
Program Manager, Entity Framework

Conclusion

Personally, I'm kind of conflicted about this. On the one hand I see their point that there should only be one data access strategy, and LINQ to Entities is clearly the bigger brother in this regard. If it were any other people running this, I'd be cheering from the sidelines. However the Entity Framework team have a history of ignoring their customers, being opaque and secretive, and delivering a bloated and inefficient product that did not address the most common requirements, despite being warned that this would happen. They've promised to be more transparent and open, but these two posts are opaque and vague.

Personally, I wish them luck, and hope that they can indeed deliver a system that merges the best of LINQ to SQL and LINQ to Entities.

However, until such time as they actually start speaking English and being forthright, I won't believe them.

Right, so in Part 1 we looked at the requirements for a Discovery Service and in Part 2 we looked at how we'd host the WCF service. Now it's time to look at the UDP itself. First off though, if you look in Part 2, at the declaration I've got for DiscoveryServer, you'll see I'm inheriting from a DiscoveryBase. This base class will handle most of the UDP stuff, since both DiscoveryClient and DiscoveryServer need a very similar set of operations.

Let's look at the most important method on DiscoveryBase:

private UdpClient _udpClient;
private long _connected;
private int _timeToLive = 2;
private IPAddress _address;
internal void Prepare(IPAddress address, int localPort)
{
    // Ensure we're not already discovering
    if (Interlocked.CompareExchange(ref _connected, 1, 0) == 1)
        throw new InvalidOperationException("Already in discovery mode.");

    // Create the client
    _udpClient = new UdpClient(localPort);

    // If it's broadcast we set the broadcast flag
    if (IsBroadcast(address))
        _udpClient.EnableBroadcast = true;

    // If it's multicast we join the multicast group
    if (IsMulticast(address))
        _udpClient.JoinMulticastGroup(address, _timeToLive);

    // Store the location to connect to
    _address = address;
}

As you can see, it's not terribly complicated. There's a little bit of concurrency logic to ensure we're not already discovering, we create a UdpClient object, and then either set it to broadcast or join a multicast group depending on the IP address. Finally we cache the address so that our child classes can access it via the Address property. _timeToLive is an interesting property, it defines how many router hops any multicast broadcasts will go. For now, I've just set this to 2, but you could easily make it configurable.

Now let's hop back to DiscoveryServer and have a look at how it kicks off it's discovery:

public void Publish(IPAddress address, int port)
{
    Prepare(address, port);

    // Set up the discovery service
    _host = new ServiceHost<IDiscoveryService>(this, _address);
    Binding binding = BindingAddressParser.CreateTransportBinding(_address);
    _host.AddServiceEndpoint(typeof(IDiscoveryService), binding, _address);
    _host.Open();

    // Now asynchronously listen for any incoming requests
    Client.BeginReceive(Receive, null);
}

So, as you can see, it calls Prepare on DiscoveryBase, in order to ready itself for UDP operations. It then begins hosting the IDiscoveryService on a transport and binding determined by an address which is passed into the DiscoveryServers constructor. Finally it kicks off an asynchronous receive operation on the UDP client. Let's have a closer look at Receive:

private void Receive(IAsyncResult ar)
{
    if (Connected) // This will be false if the UDP client has been closed
    {
        IPEndPoint remoteEP = null;
        byte[] result = Client.EndReceive(ar, ref remoteEP);

        // Now asynchronously listen for any incoming requests
        Client.BeginReceive(Receive, null);

        // Is there actual information to fetch?
        if ((result != null) && (remoteEP != null) && (result.Length > 0))
        {
            string message = Encoding.ASCII.GetString(result);

            if (message == @"get\services")
            {
                // Handle the get services request by sending the address where the service information is hosted
                message = BindingAddressParser.ResolveAddressHostName(_address);
                byte[] response = Encoding.ASCII.GetBytes(message);
                Client.Send(response, response.Length, remoteEP);
            }
        }
    }
}

So, what happens is we check if we're connected (the Connected property on DiscoveryBase returns whether the _connected field we saw in Prepare is equal to 1), we then kick off another asynchronous receive just in case there's more requests on the way. Then we decide whether it's a valid request. If it it, we send the address where we hosted the IDiscoveryService.

In Part 4 we'll start looking at the DiscoveryClient.

Right, so in Part 1 we looked at an overview of how this discovery system should work. Now let's start looking at specifics. Before we start diving into the UDP stuff, let's look at the information that the server will publish and how this will be accomplished. If you recall, the mechanism I suggested was that the server host a normal WCF service that held the information about the services, and that the UDP would be used to provide access to this service. So let's start with the WCF side.

First off we're going to require our ServiceInfo class. This is a relatively simple class, which contains an Address property and a set of Name/Value pairs. Needless to say, it needs to be decorated with the DataContract attribute. To provide access to this information we will have the IDiscoveryService interface:

namespace Sanity.Net
{
    [ServiceContract]
    internal interface IDiscoveryService
   
{

        [OperationContract]
        ServiceInfoCollection GetServices();

   
}
}

You'll note that I made this interface internal, since there's no real call for outside applications to access it without going through the Discovery system we're writing. Our next step is to create a DiscoveryServer class which will take the address it needs to publish as a parameter in its constructor. In order to keep things simple, I'm only going to allow the default binding for the transport mechanism. As I discussed in Part 1, discovery is not the place to get fancy with communication methods, it's where you find out about the fancy communication methods.

We'll create a Publish method on the DsicoveryServer, that will create the service host and open the service endpoint:

private string _address;
private ServiceHost<IDiscoveryService> _host;
public void Publish()
{
// Set up the discovery service
_host = new ServiceHost<IDiscoveryService>(this, _address);
Binding binding = BindingAddressParser.CreateTransportBinding(_address);
_host.AddServiceEndpoint(typeof(IDiscoveryService), binding, _address);
_host.Open();
}

Now, you'll notice my BindingAddressParser, that's just a little class that will infer the binding from the address, NetTcpBinding for "net.tcp" and so forth.

Our next step is to add our ServiceInfoCollection to the class so that services can be given to the discovery server:

private ServiceInfoCollection _localServices;
public ServiceInfo AddService(string address)
{
address = BindingAddressParser.ResolveAddressHostName(address);
return _localServices.Add(address);
}
Now an interesting item is that ResolveAddressName. If you think about it publishing an address like "net.tcp://localhost:21021/MyService" is not terribly useful, since we need to know where localhost actually is. So ResolveAddressHostName uses DNS to replace localhost with the machines actual name. It will also resolve an IP addresses to the machines name as well (e.g. 127.0.0.1).

Finally, we can implement the IDiscoveryService interface:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
public class DiscoveryServer : DiscoveryBase, IDiscoveryService
{

ServiceInfoCollection IDiscoveryService.GetServices()
{
return _localServices;
}
}

Okay, we now have a server-side that will take services with arbitrary information and make them available on TCP/IP. In Part 3 we will look at making this available on UDP as well.

More Posts Next page »