May 2005 - Posts

Let's discuss your application being a memory hog. A lot of people think that the CLR will “take care of memory” and that therefore they can just do what they damn well please with RAM. Whilst RAM is looked after very well by the CLR, that doesn't make it an infinite resource. I'm not going to get into disposable resources or GC here, I'm concentrating on memory and it's limits.

How much memory can your application use? You'd think that'd be a simple answer and you'd be wrong. On a 32-bit system simple logic suggests that 4GB is the amount of memory available to us. Well, sort of. However, Windows reserves 2GB of this memory space for itself. Thus, your application can only address 2GB. But, to add insult to injury, .NET steals a fair whack of this as well. The CLR does a lot of things under the hood, and the GC information can take up an awful lot of space. There's no hard and fast rule here, but generally if you stick to under 800MB you should be fine. You'll normally have between 800 MB and 1.2GB available to your application.

So what can you do about this? You can decrease the amount of memory that Windows grabs for itself. This is a two-step process. First you need to boot Windows with the /3GB option set. Secondly, you need to link your app with the LARGEADDRESSAWARE switch. For .NET applications you can set this switch by running the following command:

link -edit -LARGEADDRESSAWARE myApp.exe

from the Visual C++ bin directory.

This will now mean that Windows will only grab 1GB of the 4GB address space, giving your app a corresponding boost in the memory it can use. I haven't tested this myself, but I'd guess that you should now have access to between 1.2GB and 1.8GB.

Of course there is another option. You could just run your app on a 64 bit machine. Now the address space is 8TB. If we assume that Windows munches half of this, and .NET another half you'd still have 2TB of memory available to your app, and 2TB ought to be enough for anyone (shades of BillG's comments about 640KB come leaping to mind).

Given all this, the moment you start thinking about allocating huge amounts of memory (e.g. read an 800MB file into memory), it's time to look at other options. The GC does not make memory an ample resource, it just makes sure that you don't have to worry about freeing it. You still have to have some idea of how big your memory footprint is going to be.

An AppDomain is a unit of code isolation. In Windows 32, the process acts as this unit, where code from one process is stopped by the operating system from accessing code in another process. The only problem is that a process is a very large beast for such a task. Context-switching between processes is expensive and the overhead for having an active process is large.
To circumvent this, the bright sparks at Microsoft have introduced the concept of an Application Domain. Basically this is a lightweight process that exists within an operating system process. Due to the verification that the CLR performs, it can guarantee that code in one AppDomain cannot access code in another directly. As a result we have logical isolation, but not physical.

Now, our happy world is disrupted by a nasty thought. Logical isolation means that each assembly (and it's associated bookkeeping data and static instance data) must be loaded into each AppDomain separately. This makes sense in theory, but can mean a helluva lot of duplication between AppDomains that exist within one process. For this reason domain neutrality was introduced.

What domain neutrality means is that the assemblies which are domain neutral will only be loaded into the process once, and all AppDomains will share a copy. This cuts down unnecessary duplication. However we still get some "duplication" since the data segments are duplicated in order to provide the appearance to our code that the AppDomains have each got their own copy of the assembly. Thus, the static variables of our assembly are not shared across AppDomains, even though the code, JIT images, and lookup tables are.

So what actually happens? Well, when your application starts up, the CLR bootstrapper creates the following:
SystemDomain - responsible for process-wide string interning, creating interface IDs and starting the other AppDomains.
SharedDomain - responsible for managing domain-neutral assemblies. mscorlib is always loaded into this domain.
DefaultDomain - This is where your application plays.

Most applications have only these three AppDomains, but you can create your own at runtime if you wish.

So what is loaded as domain-neutral code? That's a bit trickier. It depends on what option the runtime host used when calling CorBindToRuntimeEx. There's three options here:
STARTUP_LOADER_OPTIMIZATION_SINGLE_DOMAIN - Only mscorlib is loaded domain-neutral.
STARTUP_LOADER_OPTIMIZATION_MULTI_DOMAIN - All assemblies will be loaded domain-neutral.
STARTUP_LOADER_OPTIMIZATION_MULTI_DOMAIN_HOST - All strong-named assemblies will be loaded domain-neutral.

Interestingly you can actually impact this in your applications execution. When you create an AppDomain, you can specify some options for it's setup, specifically it's LoaderOptimization option. This has four options relating to domain-neutrality: MultiDomain, MultiDomainHost, NotSpecified, SingleDomain.

Surprise, surprise they appear to be the same as the CorBindToRuntimeEx options! Except for that NotSpecified option of course. Well, they are related, the settings you specify for this will override your applications settings for the created AppDomain. The NotSpecified option means that the created AppDomain will use the same setting as the AppDomain that is creating it.

So, what exactly happens if you have different rules between the two AppDomains? How then does the CLR decide whether an assembly is to be loaded domain neutral or not? Each AppDomains settings will apply for that AppDomain. Imagine we have three AppDomains: A, B, and C. A and B will load our assembly domain-neutral, whilst C will load it unshared. This will result in the assembly residing in two places: SharedDomain, and AppDomain C. AppDomain A and B will obviously contain references to the AppDomain in SharedDomain.

Seems simple enough doesn't it? However, there's a hitch. When we load an AppDomain as domain-neutral, all of it's referenced assemblies have to be loaded domain-neutral too. This set of assemblies is referred to as an assemblies binding closure. You might think this would be fine since an assemblies binding closure can't change from AppDomain to AppDomain, right? Wrong. Each AppDomain can have it's own assembly resolution rules.

So where does this leave us? Well, if the binding closures are identical, then all is well and good. However, if the binding closures are different, the CLR will create two copies of our domain-neutral assembly in SharedDomain. Any future AppDomains using this shared assembly will have their binding closures evaluated, and will get a reference to the appropriate copy (or a new copy made).

All this is fun and good, but what impact does it have on you? Well, first off it's a good idea to know what setting your DefaultDomain starts up as. Single Domain is the default .NET setting, whilst ASP.NET uses MultiDomainHost. The next thing to keep in mind is that loading shared assemblies with different binding policies may defeat the purpose of having shared assemblies. The point of shared assemblies is that you're trading a little speed for a smaller memory footprint. If you're throwing away the smaller footprint by having non-shared shared assemblies, then the performance hit isn't going to be worth it.

So how do we work out if an assembly is domain-neutral? Well, in .NET 1.0 and 1.1, you can try and work it out based on knowledge of the assembly coupled with knowing what LoaderOptimization option the AppDomain started with. In .NET 2.0 System.Diagnostics.Loader provides the method AssemblyIsDomainNeutral which will give you this information. In addition .NET 2.0 also provides runtime hosts with much finer grained control than the broad options outlined above. You can implement the GetDomainNeutralAssemblies method of IHostControl to give your host complete control over domain-neutrality.

References:
Junfeng Zhang's brilliant article on domain-neutral assemblies, Chris Brumme's article on AppDomains.
or ... Yet Another Whiny Gripe about .NET

In honour of the
Return of the Sith (which I'm going to see tomorrow), imagine the Imperial March playing in the background throughout this diatribe.

Right, so what whiny pansy-assed opinion am I going to attack this time?

Why does .NET make it so easy to decompile my code?
Actually this is a pretty good question, in that it appears to have a simple answer ("because MS wanted it that way"), and in fact the reason is more complex. It turns out that this is an emergent feature of a confluence of factors, the most important of which is Verification. So what is verification? It's the process that the JIT compiler performs to ensure that your code isn't going to break out and guzzle your beer, or worse.

There's basically two broad results of a verification procedure:
  • Verifiably type-safe - This has been proven to be type safe.
  • Scummy - This code could be doing unspeakable things with your new HDD.
 
Well, so what does this have to do with decompilation? You see, JIT has to be fast, and it has to be accurate. If it's operating off complicated inputs, then it will be neither fast nor accurate. When you add into this mix all the little performance optimisations that the JIT will try to do for you, we quickly see that a complex input would result in either unacceptably long JIT times, or unacceptably bad JIT results.
 
I'm pretty sure that MS didn't go out and say "let's write a system that can be easily reverse-engineered", it just happens to be a logical result of making a verifiable type-safe JIT-ted system.
 
So what can I do about it?
Get out of delivering products to consumers. Frankly, that's the only way you'll reliably keep your IP secure. The problem is that when you give something to your customers, they can by definition reverse engineer it. If they couldn't reverse engineer it they wouldn't be able to make it work. This is why things like DRM are bad technology. They work off the assumption that the consumer cannot illicitly decrypt the data, but they have to provide the means of decryption to the consumer in order to allow the user to view the movie, hear the song and so forth. So, this means that by reverse engineering the decryption mechanism one can circumvent the DRM protection.
 
Now, .NET does make this reverse engineering process easy, especially when using tools like Lutz Roeder's Reflector coupled with such nifty add-ins as Denis Bauer's File Disassembler. So what can one do about this? You could use an obfuscator like Dotfuscator or XenoCode. These tools will garble your internal names, mess around your logic while retaining it's workings, and encrypt strings to make it harder for hackers to find that piece of your code that shows the "Trial Period Expired" message. Just as a bonus obfuscators will generally cut down your code size by, for example, renaming your "CustomerDetails" class to "a".
 
What this is basically doing is removing some of your documentation. If you think about it, you'll see that there's quite a bit of documentation associated with your code. The class and member names are documentation. The comments are documentation. The unit tests, help files, error messages, stack traces, debug messages, trace messages, instrumentation, debug symbols, and data files are all documentation.
 
Now, when you ship a compiled (non-obfuscated) product you're removing the comments, the actual "official" documentation, the debug messages, and the debug symbols. A hacker could easily recreate everything except the comments and the documentation itself. So then the question becomes: "how important are the comments and documentation to understanding my code?"
 
In my opinion, they're critically important. It's hard enough understanding someone else's code even with these aids. Think about how long it takes a new developer on your team to come fully up to speed with your project, now double that time for not having mentoring, double it again for not having documentation, and double it again for not having comments. If nothing else this should inspire you to go write documentation and comments for your project. Throw in an obfuscator and you could probably double this time again, bringing your obfuscated, undocumented, uncommented C# as easy to read as well commented, well documented C++ code (Sorry, couldn't resist).
So, my advice is: stop worrying about people copying you, slap a "do not reverse engineer" clause into your EULA and work on more productive things like, I don't know, improving your product. Oh, and make sure you document and comment, it'll give you a competitive advantage against people who steal your code...
I've released a slight patch to the Diff Add-In. The new version is version 0.71, and has the following changes:
  • If it cannot match an assembly automatically, it will not pop up an annoying dialog, it will merely populate the right pane with blank lines.
  • There is now an ellipsis button just above the right pane that will bring up a combination framework assembly list/file selection to allow you to select a DLL.
  • It will now remember the last 20 manual matches, so if you commonly diff one assembly to a specific location, and choose the right pane item for it, it will open it automatically next time. Manual matches obviously override automatic matches so if you want to diff mscorlib 1.1 to System.Data 2.0, it'll keep that link.
  • The bug where sometimes the panes did not resize to fill the diff tool is now fixed.
  • Scroll synchronisation has been improved.
 
And now for the DOH! moment: When I moved to my new domain, I brought across version 0.6 of the add-in, and forgot to upload 0.7.
IT'S ALIVE!!! IT'S STILL ALIVE!!

Yep, believe it or not, but I still have some whiny gripes about .NET that I want to ridicule and belittle.

Why isn't .NET Complete
This interesting little belief is silly. The belief can be summarized into the following axioms:
  • .NET is supposed to be a complete Framework
  • The .NET components call into non-.NET components & libraries
  • Not everything Microsoft has ever written has been ported to .NET yet
 
THEREFORE: .NET is not a framework, but rather a facade
 
Well, let me try to challenge these axioms one by one in a mature and restrained way...
 
Nah.
 
.NET is supposed to be a complete Framework
 
Utter crap. Anyone making that assertion doesn't understand the first point about Interop. There's no such thing as a complete framework anywhere on earth. The various Framework components are, however, more than enough for most applications. When you pull Interop into that equation, .NET is more than complete enough for all but the most esoteric applications. In those few situations, you're welcome to use mixed-mode assemblies, or failing that, write a COM component and use COM Interop to use it in your .NET component.
 
That's the whole point of the various flavours of Interop, to allow you to use existing code that wasn't written in .NET.
 
The .NET components call into non-.NET components & libraries
 
So what? They'd always have to do this to some degree, unless the whole OS was written in .NET. Here's a wake-up call, Java calls into non-Java components and libraries too. Guess what? C++ calls into non-C++ libraries as well! Oh my word! That makes C++ a facade. Utter rubbish. I challenge anyone to write an OS without using a line of assembler. Does that make all operating systems a facade? In a very narrow sense, I suppose it does, in the same way that a skyscraper is just it's foundation.
 
Not everything Microsoft has ever written has been ported to .NET yet
 
Duh! The argument here appears to be that since some Microsoft code (i.e. most of it) has not been fully ported to .NET yet, that it's in some way not pure to call into it with Interop, and that therefore Microsoft is not fully committed to .NET. Basically, when a new technology comes out we're supposed to completely toss all of the old stuff out the window. If we don't, then we're not being pure. Stuff that! That means that when I got a cellphone, I should have got rid of my fixed-line phone...
 
Okay, so I did, but that doesn't detract from my point. A new concept very rarely completely supercedes the old, it generally builds on it and extends it. Take the Theory of Relativity as an example. Was Einstein supposed to throw out all mathematics and physics in order to develop his new idea? Of course not, he built on Maxwell, Lorenz and Newton. Just because his work superceded theirs doesn't mean that their work couldn't be used by him. Nor does it mean that we can't use their work now. In fact, Newton's equations are far more useful in everyday life than Einsteins. This doesn't invalidate the value of Einstein's work either.
 
.NET is an addition to the programmers toolkit, and a pretty comprehensive one, but is most definitely not a complete replacement for all existing tools.
More whiny gripes about .NET that make me want to reach for a crowbar:

.NET doesn't have the expressiveness of C++
I'm going to write this slowly, so please read slower too: .NET is what we like to call a platform, and C++ is what we call a language. Apples, oranges.

Okay, you can speed up now.

There's this funky thing called C++/CLI that allows you to write C++ that targets the .NET platform. There's also a horrible Quasimodo called Managed Extensions to C++, but the less said about that the better. Just note that it does indeed allow you to write C++ that targets .NET. Nasty looking unfriendly C++ to be sure. C++ with funky things like "__gc *" and "*" that look similar and behave differently. Horrible, horrible double underscored keywords all over the place...

Sorry about that. As said above, I won't go into MC++. Every time I look at code I've written in it I feel like crying. To be fair, looking at C++ code gets me pretty emotional at the best of times. I've lost track of the number of times I've started sobbing at the sight of such beauties as:
  • #define LOOP while(1)
  • #define class struct
  • #define private public
  • #define protected public
  • void QSort_Pnts ( LsA _B, LsA _E ) { if ( _E <= _B ) return; LsA B = _B - 1, E = _E + 1 ; int Pnts = ( * _B )->Pnts ; LOOP { while ( ( * ++ B )->Pnts > Pnts ); while ( E >= B && ( * -- E )->Pnts < t =" *" b =" *" e =" T">
Waaaay more expressive. So expressive that you can expressive yourself into an early grave.
 
.NET doesn't have the power of C++
Power, in my not so humble opinion, flows more from libraries than from languages. How powerful is C++ without any libraries? Pretty damn useless. Let's not even get started on C#, since without libraries, C# quite simply cannot operate. So rephrase this as ".NET doesn't have as many powerful libraries as C++". Well, duh! C++ has been around since the Bronze Age.
.NET has pretty damn convincing libraries. It also allows you to call into C-style libraries like the Win32 API using P/Invoke. It provides the ability to call into COM libraries using COM interop. That gives .NET a huge leg up in the library department. I doubt even close to those for C++, but give it time.
 
.NET doesn't have the speed of C++
True. C++ will almost certainly outstrip any equivalent managed code in similar circumstances. That said, the most significant performance improvements come from improving your algorithms rather than changing your programming language. In any case, in most scenarios, performance isn't that important.
Performance can be faked. A bold statement I know, but it's often true. There's a world of difference between actual performance and perceived performance. Perceived performance can be improved (often at the cost of real performance) by judicious use of splash screens, progress bars and threading. Okay, I hear you shout, that's fine for the desktop, but what about the server? Well, there it can be improved by use of threading and scaling. What's faster:
  • A fast service running on a single server.
  • A 30% slower service running on two load balanced servers.
 
Well, the correct answer is the fast service. But from a perception point of view, the slower service is the faster one. Given that writing high-performing code often takes much longer than writing normal code, and given that custom written software is much more expensive than computer hardware, simple economics says that we should concentrate on writing easy code that can scale out.
 
In addition, keep in mind the 90/10 rule. Your code spends 90% of it's time in 10% of the code. Optimize that, not the unimportant 90%. And how do you know which is the important 10%? Well, unless you're a performance guru, you won't. Rather write all of your code for ease of coding and maintenance. If you find performance issues, see if you can improve the perceived performance first. If that's still not enough, then profile your code to find that 10%, and optimize that, and that only. And, yes, if it's so important to you, you can write that 10% in C++. Hell, write it in assembler for all I care.
 
Now, onto the JIT compiler. Keep in mind that C++ is optimised for the machine on which it is compiled, whereas .NET code is optimised for the machine on which it is running, courtesy of our JIT compiler. The dream is to compile your .NET code on your 32-bit machine, copy it to a 64-bit machine, and watch it take full advantage of the 64-bit environment. Currently, there are some issues with this, but I'm hopeful that these will be overcome (say in .NET 2.2). In any case, as various .NET performance issues are resolved (e.g. boxing in collections removed by generics), the performance of .NET will gradually approach closer to C++.
I've started reading and posting on Usenet, and have come across a few whiny gripes about .NET that really get up my nose. Since I'm trying to be civil, I have restrained my righteous wrath in my posts, but need to blow off my steam.

Here goes!

Why is .NET so big?
Um, why is Java so big? Why is Linux so big, Windows? They're all platforms. They contain everything needed to run programs that do useful things. Java and .NET sit on top of other platforms, and call into said base platforms to do most of this stuff. However, they still have an awful lot of code to make said platforms services appear more OO. Plus, they both add all sorts of funky code that is separate from the platform. Plus, they provide funky JITters to ensure that your code will be optimized for the machine it's running on rather than the machine it was compiled on. Plus, they provide all sorts of little goodies like GC and reflection that are not provided by the base platform.

So then, why is .NET 10MB bigger than Java? Well, that would be due to MS's OWBIS (One Whomping Big Install Strategy) as opposed to Java's MBIS (Multiple Big Install Strategy). I'm not too clued up on Java, but from what I understand, things like Enterprise Beans and most of the decent UI libraries are not shipped with the basic Java install. With .NET you get Enterprise Services, Windows Forms, ASP.NET and the kitchen sink all together.

Why can't I just statically link my libraries?
Well, there are actually tools that allow this, but there are two good reasons not to. One's a bit superfluous though. The superfluous one is space & bandwidth. What's smaller?
1) 1 x 25MB file + 15 100K files
2) 15 x 2,100K files (2MB of framework linked to each file)

If a person is only going to use the library in one program, then statically linking makes the most sense. However, I've heard from a very unreliable source that the MS vision is going to be changed to "craploads of .NET programs in every home". If they all statically-linked the BCL, hope to hell that TB hard drives become affordable soon.

The biggest reason, in my opinion, is security. To be precise, Code Access Security (CAS). For those of you who may have slept through .NET boot camp, this is the funky little creature that stops your programs working. Or, to be more correct, stops scummy untrusted code from violating your computer in unspeakable ways.

Hmmm, but what's untrusted code? Well, it's code that either can't prove it's parentage, or comes from a shady background. The kind of code you'd be unhappy about your daughter dating in other words. Now, before you load up that shotgun, keep in mind that the code could actually be very good, trustworthy code. But how would you know? One way would be to establish it's background, another would be to give it only a little rope to hang itself with. .NET does a combination of both.

It allows the untrusted code to call and use libraries that are trusted. In other words it lets the code take your daughter out to restaurants where you're friends with the owner. It doesn't let your code do untrusted things, like calling Windows API's directly, or performing COM Interop, or taking your daughter to Lovers Lane.

Now here's a question (and where my analogy breaks down), how does it know that a library is trusted? Well, said library must be digitally signed for starters. If you can't tell if the library has been changed, then trusting that library is a bit silly. The library must also come from a reputable background. Alternatively the library must have been explicitly trusted. Now, if the library is trustworthy as well as trusted, it'll make sure that code calling it has an appropriate amount of trust before allowing said call to happen. A great example is System.Windows.Forms. It allows untrusted code to do things like show Common Dialogs. Internally System.Windows.Forms uses the Win32 API to do these things. Since System.Windows.Forms is trusted, this is allowed, and the framework trusts the library to ensure that untrusted code doesn't do something daft or dangerous.

So now your untrusted code is allowed to call trusted libraries that may or may not allow certain actions dependent on the untrusted codes permissions. The more trusted your code is, the more the libraries will allow it to do. I'm not going to get into evidence, which is how the Framework decides what level of trust to give your code. Partly because it's boring, partly because my daughter anology will help you grasp the concept, and mostly because I slept through that section of the documentation. Twice.

But what does all this have to do with static linking I hear you cry? Well, if we were to allow you to statically link in portions of trusted assemblies, and ship them as part of your code, how could it stay trusted? You can't sign it, because you don't have the private keys used to sign the assembly. Giving out said private keys would be tantamount to removing CAS from .NET. So, the library components you linked into your application would have to have the same level of trust as your application. If your application is untrusted, that means no PInvoke or COM Interop. So we can kiss showing dialogs, windows, messageboxes and suchlike goodbye. Bingo with console output, file access, and network connectivity. We couldn't allow it too much processor time, or too much RAM. So basically what your program would be able to do is take data hard coded into it, do some operations with it, and toss the result away. Not terribly useful.

Whew! I'm writing an essay, and I'm still not finished my ranting.
I just had a very nasty surprise with a software product I purchased, and I thought I'd just fulminate for a bit on a trend in software which has been accelerating for some time. This trend is the "screw your customers" trend. It manifests itself in many ways, and I'll discuss some below. Basically the principle across these methods is the same, you promise something and deliver something else entirely. This is just plain wrong, it's dishonest and it's unethical. Whilst most of these methods leave the customer with no legal recourse against you, they certainly don't encourage customers to purchase your software again. Plus, it would be best to keep in mind that some of these practices are so unethical that it is likely that they they'll be legislated against sometime in the near future. Now criminalization is not retrospective, but who really want's to be known as the guys who were forced to change their business model because it became illegal?

EULA's
End-User License Agreements are (in my opinion) the biggest waste of time and effort of any software product. They can all basically be boiled down into a few bullet points:


  1. Whatever happens, it's not our fault, it's yours. No matter what.
  2. You thought this product belongs to you, but it doesn't, it's ours and we're just letting you use it for a while.
  3. We're going to tell you what you may and may not do with thise product, since it's not yours we can give you any instruction we like.
  4. We don't trust you at all, not one iota. But you can trust us, promise.
  5. We hate your guts and wish you were dead.
I challenge you to find a single EULA of a commercial product that doesn't fit largely to the themes above. Now, I don't know about you, but I find the above highly offensive. The thing is that it's written into such a long stream of legalese that most people have no idea that they're getting completely shafted.
Can you imagine car manufacturers making you sign a EULA with point 1, and then skimping on safety features? They'd be in jail in seconds, or out of business. Can you imagine a CD manufacturer telling you that you could only play their CD's at certain times of day, on the manufacturers HiFi's only, and only after you'd phoned a number to confirm that you weren't a thief? Hah! CD manufacturers may want to go that way, and DVD manufacturers have a bit with region-encoded DVD's, but they can't take it as far as software companies do every day. Better yet, can you imagine a book seller trying to tell you that you don't actually own the book, but are just leasing it for a while, and that if you mistreat the book, they'll take it away?
Activation & CD-Keys
CD-Keys are an upfront method to avoid software piracy, and I don't have any problem with them. But please make sure they work. I recently bought a game (which shall remain nameless to protect the guilty), and the CD Key did not validate. It made me hopping mad, and I'll be returning it to the store and loudly complain that they're trying to sell me pirated software. It'd not the stores fault, but hopefully they'll put pressure on the distributer.
Activation is an after install method to avoid software piracy, and again I don't have a problem with it in principle. But activation does require an Internet connection, and not all of us are fortunate enough to have always on Internet access. Activation should be able to work from a different machine, since the person might have to go to an Internet Cafe to activate their product. Also, related to this, the product should work for a time without activation (e.g. Windows/Office). In the case of games, which some people play from start to finish you could require activation to unlock all the levels, but allow the people to play the game to say, level 2 without activation. The reason is that I've already entered the CD-Key to prove that the game is legit. Telling me that I still can't play it at all despite that just makes me see red.
Updates
I like the fact that a lot of programs automatically connect to the Internet to assess whether the program has any updates available, and offer to download them for you. However, this should not be required for the program to function, especially not if the updates are huge (and out of the First World a huge update is 50mB+). Give the user the option, don't force them. I have ADSL, and it has a cap. So this selfsame game mentioned above requires me to download some gigantic file before it even tell me that my valid CD-Key is invalid. But what if I was close to my cap, and I'd rather update the game on the 1st of the month. Tough luck buddy, you can't play the game till then. Guaranteed to lose you customers.
Ownership
Software companies like to think that even though they give you a box, and CD's and manuals when you pay for their product that somehow the product does not really belong to you. This is absolute rubbish. Unless the packaging explicity states in big bold letters that the contained product is just a temporary lease, I feel that the product is mine. Software manufacturers are confusing intellectual property with property. The code and IP for the game is indeed the manufacturers, but the purchased game is mine. Sony cannot tell me that I can't take a sledgehammer to their TV's, but they can tell me not to reverse engineer it, or repackage it. That's standard IP, patent and copyright law. But the TV is mine. If I want to sell it to someone else, I can. If I want to paint it, I can. If I want to move it to another room, I can. Because it's mine.
Software companies seem to feel that because they own the IP, they also own the product. This is rubbish. I know that code is a tricky one because the IP and the product are indistiguishable. The product is the CD's and packaging and contents that I bought. That gives me the exclusive right to use those as I see fit, not as the software company sees fit.
Conclusion
The above bad practises may be technically legal, but they do leave a bit of a sour taste in the mouth don't they? They leave one with the impression that many software companies are ripping us off, pushing us around, and generally treating us like dirt. So what's the solution? I'm not going to advocate boycotting. What I am going to advocate is that those of us who are not unethical and who actually appreciate our customers don't fall into the traps above.
  • Provide easy to understand EULA's (or if you have to have a long EULA, summarize it above the main body).
  • If you're only leasing software, make sure that this is made very clear to the customer up front. Dishonesty will only breed unhappy customers (and irate blog entries).
  • Make sure that the legally purchased software can be up and running on the users PC within minutes of the installation completing.
  • Don't force users to do things they may not wish to do (except pay of course ;D). Suggest the best course of action, and make it easy to do what you want them to do, but forcing them just makes some customers irritated.
  • If you're going to sell your software outside of the First World, keep in mind the fact that very few Third World users have dial-up, let alone broadband.
  • Keep in mind that many people consider applications that quietly send information to the seller to be spyware, no matter how little information is sent. Also keep in mind that some of those users have firewalls that tell them that your program is being sneaky and underhanded.
I recently needed to validate a South African Identity Document Number. There's not a whole lot of information on this on the web, but Jacques Marneweck had an interesting couple of posts about this. I managed to track down some more detail from Home Affairs, so here we go.
A Regular Expression to parse the ID Number is shown below:

(?[0-9][0-9])(?([0][1-9])¦([1][0-2]))(?([0-2][0-9])¦([3][0-1]))(?[0-9])(?[0-9]{3})(?[0-9])(?[0-9])(?[0-9])

The first six digits are the date of birth in YYMMDD format. The seventh digit is the gender: 0-4 for Female, 5-9 for Male. The 8th - 10th digits are the series, which is your index in the number of people with that DOB/Gender combo allocated. The 11th digit is the citizenship number, which is 0 for South Africans and 1 for foreigners. The 12th digit is generally either 8 or 9, but this is not guaranteed. The 13th digit is a control digit, with a nasty algorithm to validate it.
  • Calculate total A by adding the figures in the odd positions i.e. the first, third, fifth, seventh, ninth and eleventh digits.
  • Calculate total B by taking the even figures of the number as a whole number, and then multiplying that number by 2, and then add the individual figures together.
  • Calculate total C by adding total A to total B.
  • The control-figure can now be determined by subtracting the ones column from figure C from 10.
  • Where the total C is a multiple of 10, the control figure will be 0.

I've written a class to validate and handle ID numbers in C#. If anyone wants a copy, all you need do is ask. Please note that I haven't tested this to any serious degree though.
I broke one of the main rules of programming the other day. I'm writing a control which will host a COM control. Now, this COM control has a very chatty interface. So I thought, why not write an ATL shim between my .NET control and this COM control that will handle the chatty aspects. and only pass up the important stuff. So, I duly went along and spent a week and a half implementing this C++ shim.

Now the problem is that while I knew each Interop call would cost me cycles, and I knew that there were quite a few Interop calls, I hadn't profiled the problem. I came to my senses a couple of days ago, and wrote a pure .NET version in just a couple of days. Profiled the two, and what do you know, the performance differential between them is miniscule. Turns out that most of the chattiness is at startup and teardown, and much less during operation.

A typical example of premature optimization. You can't reliably know what's going to be the performance critical sections until you profile the system. I should have written in .NET, profiled it, and if there were bottlenecks, optimized those.

Just as another little performance tip with Interop. The main library I was using was MSHTML. It turns out that an awful lot of my startup time was going in loading this gigantic Interop library: 7.63MB, or almost 3 times the size of mscorlib. That's a lot, and all I needed was a subset. A little scratching around on the Web, a bit of Reflector, and all of a sudden I had the interfaces I wanted in 152KB. This is quite a useful trick when dealing with Interop. The Primary Interop Assembly is not an all or nothing proposition.

When using Reflector to look at interop assemblies, it's worth noting that Reflector puts property get accessors before property sets, so if you're using it as a guide, make sure that you check the IDL for the correct ordering.