Nandos has “silently” changed the recipe for their sauces. Mild is no longer mild, but “hotter” and the old mild no longer exists. Lemon and herb tastes slightly more like lemons – the dose of lemon in the “lemon and herb” sauce has apparently been increased. Apparently, due to inconsistencies across the stores, they have reverted to the “original” Nandos sauces.
There is often the need to parse delimited data in SQL Server. For instance, a comma separated values (CSV) list of Id's from a multi-select Reporting Services report parameter, or a data field that is stored as a pipe-separated list of strings.
Here are two T-SQL (2005 or later) user defined functions (UDF's) to generically parse any delimited list of values.
You supply the delimited list of values as an nvarchar(max) argument -- this caters for Unicode and the maximum possible length (@Values).
You also supply a delimiter string of one or more Unicode characters, also passed as an nvarchar(max) parameter (@Separator). For instance you supply ',' for CSV and '|' for pipe delimited.
The UDF_ParseArray UDF is a table valued function that returns a table that has the following definition:
RETURNS
@Array TABLE (
[Index] int NOT NULL IDENTITY(0, 1) PRIMARY KEY,
[Value] nvarchar(max) NOT NULL
)
The Index column is the 0-based array index (or field index). The Value column is the actual value of the array entry (field) as a Unicode string up to the maximum possible length.
The UDF_ParseArrayIndex UDF is a scalar valued function that returns a particular array entry given the array index. In addition to the two arguments described above (@Values and @Separator), you supply the array index argument (@Index, 0-based, int).
Source code:
CREATE FUNCTION [dbo].[UDF_ParseArray]
(
@Values nvarchar(max),
@Separator nvarchar(max)
)
RETURNS
@Array TABLE (
[Index] int NOT NULL IDENTITY(0, 1) PRIMARY KEY,
[Value] nvarchar(max) NOT NULL
)
AS
BEGIN
DECLARE @Pattern nvarchar(max);
DECLARE @Offset int;
DECLARE @PatIndex int;
SET @Values = @Values + @Separator;
SET @Pattern = N'%' + @Separator + N'%';
SET @Offset = LEN(@Separator) - 1;
SET @PatIndex = PATINDEX(@Pattern, @Values);
WHILE @PatIndex <> 0
BEGIN
INSERT INTO @Array ([Value]) VALUES (LEFT(@Values, @PatIndex - 1));
SET @Values = STUFF(@Values, 1, @PatIndex + @Offset, N'');
SET @PatIndex = PATINDEX(@Pattern, @Values);
END;
RETURN;
END;
GO
CREATE FUNCTION [dbo].[UDF_ParseArrayIndex]
(
@Values nvarchar(max),
@Separator nvarchar(max),
@Index int
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE @Pattern nvarchar(max);
DECLARE @Offset int;
DECLARE @ArrayIndex int;
DECLARE @PatIndex int;
SET @Values = @Values + @Separator;
SET @Pattern = N'%' + @Separator + N'%';
SET @Offset = LEN(@Separator) - 1;
SET @ArrayIndex = 0;
SET @PatIndex = PATINDEX(@Pattern, @Values);
WHILE @PatIndex <> 0
BEGIN
IF (@ArrayIndex = @Index)
RETURN LEFT(@Values, @PatIndex - 1);
SET @Values = STUFF(@Values, 1, @PatIndex + @Offset, N'');
SET @PatIndex = PATINDEX(@Pattern, @Values);
SET @ArrayIndex = @ArrayIndex + 1;
END;
RETURN NULL;
END;
GO
Examples:
SELECT * FROM [dbo].[UDF_ParseArray](N'The,Big,Brown,Fox,Jumped,Over,The,Cow', ',');
CREATE TABLE [dbo].[Data] (
[Id] int NOT NULL PRIMARY KEY,
[Value] nvarchar(max) NOT NULL,
[Array] nvarchar(max) NOT NULL
);
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (1, 'Test1', 'Test1_Index0|Test1_Index1|Test1_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (2, 'Test2', 'Test2_Index0|Test2_Index1|Test2_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (3, 'Test3', 'Test3_Index0|Test3_Index1|Test3_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (4, 'Test4', 'Test4_Index0|Test4_Index1|Test4_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (5, 'Test5', 'Test5_Index0|Test5_Index1|Test5_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (6, 'Test6', 'Test6_Index0|Test6_Index1|Test6_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (7, 'Test7', 'Test7_Index0|Test7_Index1|Test7_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (8, 'Test8', 'Test8_Index0|Test8_Index1|Test8_Index2');
INSERT INTO [dbo].[Data] ([Id], [Value], [Array]) VALUES (9, 'Test9', 'Test9_Index0|Test9_Index1|Test9_Index2');
DECLARE @IdList nvarchar(max);
SET @IdList = N'1,3,5,7,9';
SELECT
[Data].[Id],
[Data].[Value],
[dbo].[UDF_ParseArrayIndex]([Data].[Array], N'|', 0) AS [ArrayIndex0],
[dbo].[UDF_ParseArrayIndex]([Data].[Array], N'|', 1) AS [ArrayIndex1],
[dbo].[UDF_ParseArrayIndex]([Data].[Array], N'|', 2) AS [ArrayIndex2]
FROM [dbo].[Data] AS [Data]
INNER JOIN [dbo].[UDF_ParseArray](@IdList, N',') AS [T]
ON [Data].[Id] = [T].[Value];
SELECT * FROM [dbo].[UDF_ParseArray](N'Field1<fs>Field2<fs>Field3', N'<fs>');

You can insert guidelines into the Visual Studio text editor to help hard wrapping code and code comments. A typical column wrap is at column 80.
1. Open the Registry Editor (Start, Run, regedit)
2. Open the registry key: HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Text Editor
3. Add a new string value called "Guides" (right click, New, String Value)
4. Edit the string value (double click) and insert the text "RGB(192,192,192) 80" (excluding the quotes).
The format of this string is: RGB(<red>,<green>,<blue>) <column 1>[, <column n>] where:
<red>, <green> and <blue> are the red, green and blue components of the colour of the guideline you want, and <column 1> is the column number of the first guideline and <column n> is the column number of the nth guideline.
5. Restart Visual Studio.
I needed a way to test classes working with large data streams (greater than 4GB of data). I could've created a file greater than 4GB and used a FileStream. However, the performance and disk-space usage overhead was not acceptable. Thus I created the VirtualStream. The VirtualStream behaves like a NullStream (Stream.Null) in that it has no backing store, except that you can also read from it and therefore use it as a virtual source of data of any size. You can specify the initial length and position when constructing the VirtualStream and perform all stream operations, including Read, Seek, Write, Position, SetLength, etc. Another difference between VirtualStream and NullStream is that it validates the input parameters the same way that a FileStream or MemoryStream would, therefore being ideal for testing. In addition to validating all input parameters and allowing reads, you can specify that the VirtualStream simulate an actual read by filling the buffer passed to the Read method with random data. This is especially useful for testing classes that require a source of disparate data (not all zeros), for instance compression streams such as DeflateStream. I thought I'd share this class as it can save you a few minutes of development.
My specific scenario that I needed this for was testing my implementation of the Zip64 ZIP file specification. The Zip64 ZIP file specification adds additional 64-bit records to the ZIP local file header and central directory in order to cater for archives larger than 4GB or archives that contain files larger than 4GB.
using System;
using System.IO;
using System.IO.Compression;
namespace ConsoleApplication112
{
[Flags]
public enum VirtualStreamOptions
{
None = 0,
ReadRandomBytes = 1
}
internal static class VirtualStreamOptionsExtensions
{
internal static bool IsValid(this VirtualStreamOptions options)
{
return (options >= 0 && (int)options <= 1);
}
internal static bool IsSet(this VirtualStreamOptions options,
VirtualStreamOptions option)
{
return (options & option) != 0;
}
}
public class VirtualStream : Stream
{
private long length;
private long position;
private readonly VirtualStreamOptions options;
private readonly Random random;
public VirtualStream()
{ }
public VirtualStream(long initialLength)
: this(initialLength, 0, VirtualStreamOptions.None)
{ }
public VirtualStream(long initialLength, long initialPosition)
: this(initialLength, initialPosition, VirtualStreamOptions.None)
{ }
public VirtualStream(long initialLength, long initialPosition,
VirtualStreamOptions options)
{
if (initialLength < 0)
{
throw new ArgumentOutOfRangeException("initialLength");
}
if (initialPosition < 0)
{
throw new ArgumentOutOfRangeException("initialPosition");
}
if (!options.IsValid())
{
throw new ArgumentException("options");
}
length = initialLength;
position = initialPosition;
if (options.IsSet(VirtualStreamOptions.ReadRandomBytes))
{
random = new Random();
}
this.options = options;
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return true; }
}
public override bool CanWrite
{
get { return true; }
}
public override void Flush()
{ }
public override long Length
{
get { return length; }
}
public override long Position
{
get { return position; }
set
{
if (value < 0)
{
throw new ArgumentOutOfRangeException("value");
}
position = value;
}
}
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
{
throw new ArgumentNullException("buffer");
}
if (offset < 0)
{
throw new ArgumentOutOfRangeException("offset");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count");
}
if ((buffer.Length - offset) < count)
{
throw new ArgumentException(
"offset subtracted from the buffer length is less than count.");
}
long bytesLeft = length - position;
if (bytesLeft <= 0)
{
return 0;
}
if (bytesLeft < count)
{
count = (int) bytesLeft;
}
if (options.IsSet(VirtualStreamOptions.ReadRandomBytes))
{
if (offset == 0 && count == buffer.Length)
{
random.NextBytes(buffer);
}
else
{
var tempBuffer = new byte[count];
random.NextBytes(tempBuffer);
Buffer.BlockCopy(tempBuffer, 0, buffer, offset, count);
}
}
position += count;
return count;
}
public override long Seek(long offset, SeekOrigin origin)
{
switch (origin)
{
case SeekOrigin.Begin:
if (offset < 0)
{
throw new ArgumentOutOfRangeException("offset",
"Seeking is attempted before the beginning of the stream.");
}
position = offset;
break;
case SeekOrigin.Current:
if ((position + offset) < 0)
{
throw new ArgumentOutOfRangeException("offset",
"Seeking is attempted before the beginning of the stream.");
}
position += offset;
break;
case SeekOrigin.End:
if ((length + offset) < 0)
{
throw new ArgumentOutOfRangeException("offset",
"Seeking is attempted before the beginning of the stream.");
}
position = length + offset;
break;
default:
throw new ArgumentException(
"There is an invalid SeekOrigin.", "origin");
}
return position;
}
public override void SetLength(long value)
{
if (value < 0)
{
throw new ArgumentOutOfRangeException("value");
}
length = value;
if (position > length)
{
position = length;
}
}
public override void Write(byte[] buffer, int offset, int count)
{
if (buffer == null)
{
throw new ArgumentNullException("buffer");
}
if (offset < 0)
{
throw new ArgumentOutOfRangeException("offset");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count");
}
if ((buffer.Length - offset) < count)
{
throw new ArgumentException(
"offset subtracted from the buffer length is less than count.");
}
long num = position + count;
if (num < 0)
{
throw new IOException("Stream too long.");
}
if (num > length)
{
length = num;
}
position = num;
}
}
public static class StreamOperation
{
public static void Copy(Stream source, Stream target)
{
Copy(source, target, 0x1000);
}
public static void Copy(Stream source, Stream target, int bufferSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (target == null)
{
throw new ArgumentNullException("target");
}
if (bufferSize < 0)
{
throw new ArgumentOutOfRangeException("bufferSize");
}
var buffer = new byte[bufferSize];
int bytesRead;
while ((bytesRead = source.Read(buffer, 0, bufferSize)) != 0)
{
target.Write(buffer, 0, bytesRead);
}
}
}
class Program
{
static void Main()
{
using (var source = new VirtualStream(int.MaxValue + 1L, 0,
VirtualStreamOptions.ReadRandomBytes))
using (var target = new DeflateStream(new FileStream(@"c:\out.bin",
FileMode.Create, FileAccess.Write), CompressionMode.Compress))
{
StreamOperation.Copy(source, target);
}
}
}
}
WARNING: DRAFT, WORK-IN-PROGRESS
You may or may not be aware that since .NET 2.0, the .NET System.Threading.ThreadPool class (thread pool) has supported I/O completion ports -- available on Windows 2000 and later. This addition to the thread pool means a significant boost in performance of asynchronous I/O operations. However, it is not immediately obvious how to go about using this new functionality in your own class libraries and applications.
Firstly, some background on the basic architecture of the .NET thread pool.
There is only ONE thread pool per process (not per AppDomain as may be thought). This actually improves performance of I/O operations as we shall see later, by channeling all .NET framework asynchronous I/O operations, for all AppDomains in a process, through the same point.
Why a thread pool? Creating and destroying threads, while pretty quick these days, is still not free in terms of performance. A thread pool recycles threads after they are used, instead of repeatedly creating and destroying threads. The thread pool controls the minimum and maximum number of threads in the pool and automatically grows and shrinks the thread pool as needed.
To give you an idea of what is involved in creating and destroying threads, when a thread is created, the Windows kernel allocates an internal object to represent the thread, allocates memory for the thread's stack and initialises the thread's stack memory to zero. If your process has any DLLs loaded (which it typically will have several), each DLL entry point is called specifying DLL_THREAD_ATTACH. This has two implications. Firstly, Windows must load pages from disk into memory in order to execute the DLL entry point code (resulting in page faults) and then the actual DLL entry point user code must be executed. The DLL entry point user code may incur significant overhead depending on what the code actually does. This is repeated for every DLL loaded in the process whether your thread will use that DLL or not!
To destroy a thread, the entry point to every DLL loaded in the process must again be called, specifying DLL_THREAD_DETACH. Again, Windows needs to page-fault pages from disk into memory and then execute the required user code, which may incur significant overhead. The thread's stack memory must be freed and the kernel thread object must be freed if there are no more references to it.
All of this overhead just to create and destroy a single thread, before it is actually even scheduled to perform work.
To get any kind of scalable performance, a thread pool is the only way to go. Further, using multiple thread pools concurrently DOES NOT improve performance through-put, contrary to popular belief, as we shall see when we look at how asynchronous I/O operations are handled. It is a common request from developers to be able to create additional thread pools. While this may seem appealing, there are performance downsides to doing this and this is why the .NET framework team has consciously excluded this functionality from .NET. You should almost always use the System.Threading.ThreadPool for asynchronous I/O. There are a few rare cases where it makes sense to have a separate thread pool, but this is certainly NOT the general case.
The thread pool has four types of threads. They are:
- Worker Thread
- Completion Port Thread
- Wait Thread
- Timer Manager Thread
The worker thread is the one we all should be immediately familiar with. You are able to run tasks on a thread pool worker thread using ThreadPool.QueueUserWorkItem. There is currently a default maximum of 500 worker threads, but this may change in the future. You can also explicitly set the maximum number of worker threads using ThreadPool.SetMaxThreads.
The completion port threads are threads that are specifically designed to be used for asynchronous I/O operations. These threads wait on a single I/O completion port (IOCP). The .NET thread pool has exactly ONE IOCP. There is currently a default maximum of 1,000 IOCP threads, but this may change in the future. You can also explicitly set the maximum number of IOCP threads using ThreadPool.SetMaxThreads. You cannot set a maximum value less than the number of CPUs in the machine -- we will see why below.
NOTE: Some hosts may prevent you from changing the thread pool min and max threads, e.g. IIS and SQL Server.
The worker threads and IOCP threads are the only two thread pool threads that actually can run user code. The wait threads and timer threads are used internally by the thread pool to manage waiting on a wait handle and timer based tasks respectively.
When you call ThreadPool.RegisterWaitForSingleObject, you specify a WaitHandle object, as well as a delegate (callback) to the method you want to execute, a timeout and if the delegate must be executed only once or multiple times. The thread pool allocates a wait thread and calls WaitHandle.WaitOne(timeout) on the wait handle, and for the timeout, you passed to RegisterWaitForSingleObject. When the wait handle is signaled, e.g. by another thread calling AutoResetEvent.Set(), or the timeout elapses (note, the timeout may be 0 -- immediate, or -1 -- infinite), the wait thread queues your delegate (callback) to an IOCP thread pool thread for execution. If you specified execute only once in the call to RegisterWaitForSingleObject, the wait thread exits and returns to the thread pool to be reused, otherwise, the wait thread again calls WaitHandle.WaitOne(timeout). Note that your callback is executed on an completion port thread and NOT a worker thread.
Individual wait threads are not actually allocated to every single call to RegisterWaitForSingleObject. Wait threads are shared. They achieve this by internally using WaitForMultipleObjects, waiting on many wait handles. This reduces the number of wait threads required to service signaled or timeout based tasks. Currently, a single wait thread can service 63 wait handles, but this may change in the future. So if you had 1,000 wait handles registered, the thread pool uses only 16 threads to wait on those 1,000 wait handles.
Firstly, we should identify when it is appropriate to use the thread pool IOCP threads instead of the thread pool worker threads.
If you have some task that you want to execute asynchronously (in the background), the typical approach is to call ThreadPool.QueueUserWorkItem. This is completely valid for tasks that run code that utilise the CPU intensely for a relatively short period to perform their work and then finish. Some tasks though, utilise the CPU to initiate a task and then wait while some "device" performs I/O. When the "device" completes the required I/O, the CPU is again used to process the result. The latter type of task is classified as an I/O operation. I/O operations are characterised by the initial use of the CPU to initiate the I/O (the I/O request), then a dead period where the CPU is idle while the I/O is taking place (I/O typically takes place on some hardware device), and then the use of the CPU to process the response after the I/O completes (the I/O response).
If you use a normal thread pool worker thread, using Thread.QueueUserWorkItem, to process an I/O operation, that thread is blocked for the entire duration of the I/O task. However, the thread could have been used to do something else while it was waiting for the I/O operation to complete, as it is essentially idle during this time.
This is where IO Completion Ports (IOCP's) come to the rescue.
----
Since this article is currently incomplete, here is the code to queue a work item to the thread pool IOCP threads. The explanation will follow later. There is also a more complicated version of this using a Queue to queue the state rather than packing the IAsyncResult object into the Overlapped object, which could be more efficient. I will post the second version later when I complete this article.
public static class ThreadPoolHelper
{
private sealed class CallbackState : IAsyncResult
{
private readonly WaitCallback callback;
private readonly object state;
internal CallbackState(WaitCallback callback, object state)
{
this.callback = callback;
this.state = state;
}
internal void InvokeCallback()
{
callback(state);
}
#region IAsyncResult Members
public object AsyncState
{
get { throw new NotSupportedException(); }
}
public WaitHandle AsyncWaitHandle
{
get { throw new NotSupportedException(); }
}
public bool CompletedSynchronously
{
get { throw new NotSupportedException(); }
}
public bool IsCompleted
{
get { throw new NotSupportedException(); }
}
#endregion
}
private static unsafe void IOCompletionCallback(uint errorCode, uint numBytes,
NativeOverlapped* pOVERLAP)
{
try
{
((CallbackState)Overlapped.Unpack(pOVERLAP).AsyncResult).InvokeCallback();
}
finally
{
Overlapped.Free(pOVERLAP);
}
}
public static bool UnsafeQueueIOCompletion(WaitCallback callback, object state)
{
if (callback == null)
{
throw new ArgumentNullException("callback");
}
unsafe
{
return ThreadPool.UnsafeQueueNativeOverlapped(
new Overlapped(0, 0, IntPtr.Zero,
new CallbackState(callback, state)).UnsafePack(
IOCompletionCallback, null));
}
}
}
Test console application:
class Program
{
static void DisplayThreadInfo()
{
int workerThreads, completionPortThreads;
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
Console.WriteLine("id={0},workerThreads={1},completionPortThreads={2}",
Thread.CurrentThread.ManagedThreadId, workerThreads, completionPortThreads);
}
static void IOCallback(object state)
{
Console.WriteLine((string)state);
DisplayThreadInfo();
}
static void Main()
{
DisplayThreadInfo();
ThreadPoolHelper.UnsafeQueueIOCompletion(IOCallback, "Hello World from IOCP thread pool thread.");
Thread.Sleep(1000);
Console.WriteLine("Press a key to exit...");
Console.ReadKey(true);
}
}
"Transparent data encryption (TDE) performs real-time I/O encryption and decryption of the data and log files. The encryption uses a database encryption key (DEK), which is stored in the database boot record for availability during recovery. The DEK is a symmetric key secured by using a certificate stored in the master database of the server or an asymmetric key protected by an EKM module. TDE protects data "at rest", meaning the data and log files. It provides the ability to comply with many laws, regulations, and guidelines established in various industries. This enables software developers to encrypt data by using AES and 3DES encryption algorithms without changing existing applications."
Check List:
- Backup the SERVICE MASTER KEY. This is the main key for the whole SQL Server instance.
- Keep the backup file in a secure, offsite location.
- Keep the password for the backup file in a secure, offsite location.
BACKUP SERVICE MASTER KEY TO FILE = 'path_to_file'
ENCRYPTION BY PASSWORD = 'password';
GO
- Create the "master" database MASTER KEY encrypted by a password and the SERVICE MASTER KEY.
USE [master];
GO
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'password';
GO
- Backup the "master" database MASTER KEY. This is the main key for the "master" database used to ultimately encrypt your database encryption key.
- Keep the backup file in a secure, offsite location.
- Keep the password for the backup file in a secure, offsite location.
USE [master];
GO
BACKUP MASTER KEY TO FILE = 'path_to_file'
ENCRYPTION BY PASSWORD = 'password';
GO
- Create the database encryption key (DEK) server certificate encrypted by the "master" database MASTER KEY.
USE [master];
GO
CREATE CERTIFICATE certificate_name
WITH SUBJECT = 'certificate_subject_name';
GO
- Backup the database encryption key (DEK) server certificate with the private key.
USE [master];
GO
BACKUP CERTIFICATE certificate_name
TO FILE = 'path_to_file'
WITH PRIVATE KEY (
FILE = 'path_to_private_key_file',
ENCRYPTION BY PASSWORD = 'encryption_password'
);
GO
- Create the database encryption key (DEK) encrypted by the server certificate.
USE [database_name];
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE certificate_name;
GO
- Enable Transparent Data Encryption (TDE) on your database.
ALTER DATABASE database_name
SET ENCRYPTION ON;
GO
The .NET ECMA specified memory model is weaker than the x86 hardware memory model. When you run your multi-threaded .NET applications on a multi-processor machine with a weak memory model, such as the Intel Itanium Architecture (IA64), your programs may not execute as expected.
This is because weaker memory models can reorder memory operations (loads and stores) out of program sequence in order to improve performance. This reordering is guaranteed to have no visible side-effects on a single thread of execution. However, if you are accessing memory across threads, and the stores to, and loads from that memory are reordered, this may produce undesirable side-effects, effectively breaking your well-crafted application.
However, if all access to memory shared across threads is completely protected by a synchronisation primitive such as the System.Threading.Monitor class using the C# lock keyword, then you will never experience any of these side-effects. This is because .NET does all the work under the hood to ensure that the appropriate memory barriers are in place. Acquiring a lock in C# for instance implements a read memory barrier on entry, while releasing a lock in C# implements a write memory barrier on exit.
There are however cases where performance is critical and we therefore opt for a low-lock synchronisation strategy. A low-lock synchronisation strategy tries to avoid locking wherever possible to improve performance. As soon as you enter this world though, you are treading in dangerous territory and you need to be aware of memory barriers -- see System.Threading.Thread.MemoryBarrier.
A memory barrier also known as a memory fence ensures that memory operations that precede the memory barrier do not get reordered to occur after the memory barrier and memory operations that occur after the memory barrier do not get reordered to occur before the memory barrier.
.NET only offers a full memory fence that is a read (load) and write (store) memory barrier (fence) in one, via System.Threading.Thread.MemoryBarrier. Some hardware platforms offer separate read and write memory barriers, which can further improve performance.
int a = 1; // store
int b = 2; // store
int c = 3; // store
In the code above, the program sequence for the three memory operations is as follows: first store 1 in the memory location represented by 'a', then store 2 in the memory location represented by 'b' and finally store 3 in the memory location represented by 'c'.
In a weak memory model (e.g. the .NET memory model on IA64), these memory store operations can be reordered in any order, e.g.
int b = 2; // store
int a = 1; // store
int c = 3; // store
This reordering has no effect on the outcome, on a single thread of execution, and therefore the program still proceeds as expected. However, if another thread (thread B) was relying on 'a' being set before 'b' gets set, then in the reordered example, because 'b' is set before 'a', thread B would see another value for 'a' (possibly 0) instead of 1.
To correct this program, we can either place a lock around all accesses to 'a' and 'b', thus implicitly inserting memory barriers, or we could explicitly insert a memory barrier between 'a' and 'b' as follows:
int a = 1; // store
Thread.MemoryBarrier();
int b = 2; // store
int c = 3; // store
Now, the store to 'a' cannot move down past the memory barrier and the stores to 'b' and 'c' cannot move up past the memory barrier. However, the stores to 'b' and 'c' can still be reordered after the memory barrier as follows:
int a = 1; // store
Thread.MemoryBarrier();
int c = 3; // store
int b = 2; // store
The C# volatile keyword implements acquire and release semantics, which implies a read memory barrier on read and a write memory barrier on write.
You can also use System.Threading.Thread.VolatileRead and System.Threading.Thread.VolatileWrite to implement acquire and release semantics respectively. However, these API's simply make a call to System.Threading.Thread.MemoryBarrier and are slower than simply calling System.Threading.Thread.MemoryBarrier yourself. Therefore, they have little practical application, since the reason you resort to memory barriers in the first place is performance.
The System.Threading.Interlocked class API's also implicitly apply the appropriate memory barriers on your behalf. These API's are relatively quick too.
Consider the following simple program that relies on a shared 'stop' flag to stop a thread from executing:
class Program
{
static bool stop = false;
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
stop = true;
}
static void Job()
{
int counter = 0;
while (!stop)
{
Console.WriteLine(counter++);
}
}
}
On x86 hardware, this program will execute as expected. On multi-processor IA64 hardware, this application may never end. This is because the thread running 'Job' may decide to load the value of 'stop' (initially false) into the processor cache and never reload it -- blissfully unaware that another thread (the main application thread), running on another processor, has changed the value of stop to 'true'.
Likewise, the 'Job' method may never get to execute if the assignment of 'true' to 'stop' is moved before the call to Thread.Start().
To correct this program we need to lock around all accesses to 'stop'.
class Program
{
static bool stop = false;
static readonly object syncRoot = new object();
static bool Stop
{
get
{
lock (syncRoot)
{
return stop;
}
}
set
{
lock (syncRoot)
{
stop = value;
}
}
}
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
Stop = true;
}
static void Job()
{
int counter = 0;
while (!Stop)
{
Console.WriteLine(counter++);
}
}
}
Alternatively we could mark the 'stop' field volatile:
class Program
{
static volatile bool stop = false;
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
stop = true;
}
static void Job()
{
int counter = 0;
while (!stop)
{
Console.WriteLine(counter++);
}
}
}
Alternatively we could use System.Threading.Thread.VolatileRead and System.Threading.Thread.VolatileWrite:
class Program
{
static int stop = 0;
static bool Stop
{
get
{
return Thread.VolatileRead(ref stop) != 0;
}
set
{
Thread.VolatileWrite(ref stop, value ? 1 : 0);
}
}
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
Stop = true;
}
static void Job()
{
int counter = 0;
while (!Stop)
{
Console.WriteLine(counter++);
}
}
}
Notice though that there are no overloads for System.Threading.Thread.VolatileRead and System.Threading.Thread.VolatileWrite that take a Boolean parameter, so we switched to using an int instead.
Alternatively we could use System.Threading.MemoryBarrier explicitly:
class Program
{
static bool stop = false;
static bool Stop
{
get
{
bool localStop = stop;
Thread.MemoryBarrier();
return localStop;
}
set
{
Thread.MemoryBarrier();
stop = value;
}
}
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
Stop = true;
}
static void Job()
{
int counter = 0;
while (!Stop)
{
Console.WriteLine(counter++);
}
}
}
You can also creatively use the System.Threading.Interlocked class since all the interlocked operations have memory barriers as well -- this approach is significantly faster than using a System.Threading.Monitor class with the C# lock keyword:
class Program
{
static int stop = 0;
static bool Stop
{
get
{
return Interlocked.CompareExchange(ref stop, 0, 0);
}
set
{
Interlocked.CompareExchange(ref stop, value ? 1 : 0, value ? 0 : 1);
}
}
static void Main()
{
Thread thread = new Thread(Job);
thread.Start();
Thread.Sleep(2000);
Stop = true;
}
static void Job()
{
int counter = 0;
while (!Stop)
{
Console.WriteLine(counter++);
}
}
}
We've looked very briefly at the .NET memory model reordering and how you can combat it in your multi-threaded applications using the C# volatile keyword, the C# lock keyword, the System.Threading.Thread.MemoryBarrier API, the System.Threading.Thread.VolatileRead/VolatileWrite API's and the System.Threading.Interlocked class API's.
Here is a summary of all the possible reordering's with respect to one another in the .NET memory model:
In the first column is the first operation, while along the top row is the second operation. For example, a read (normal read -- NOT a volatile read) can reorder with respect to a read that follows it, but not with respect to a volatile write that follows it.
| |
Read |
Write |
Volatile Read |
Volatile Write |
Lock |
Unlock |
| Read |
Yes |
Yes |
Yes |
No |
Yes |
No |
| Write |
Yes |
Yes |
Yes |
No |
Yes |
No |
| Volatile Read |
No |
No |
No |
No |
No |
No |
| Volatile Write |
Yes |
Yes |
Yes |
No |
Yes |
No |
| Lock |
No |
No |
No |
No |
No |
No |
| Unlock |
Yes |
Yes |
Yes |
No |
No |
No |
- Normal reads and writes are freely reordered.
- Locks and unlocks are never reordered.
- Volatile reads and writes have acquire and release semantics. That is, operations after volatile-read cannot be moved to before the volatile-read and operations before volatile-write cannot be moved to after the volatile-write.
One classic low-lock strategy is the double-check lock pattern, for example:
public class Singleton
{
private static readonly object syncRoot = new object();
private static Singleton instance;
private readonly string message;
private Singleton(string message)
{
this.message = message;
}
public string Message
{
get { return message; }
}
public static Singleton Instance
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
instance = new Singleton("Hello World"); // memory barrier required
}
}
}
return instance;
}
}
class Program
{
static void Main()
{
Console.WriteLine(Singleton.Instance.Message);
}
}
This code uses the double-check lock pattern, to implement the lazy initialisation pattern to implement the singleton pattern.
In the 'Instance' property, the code prevents unnecessary locking by first checking if 'instance' is initialised or not. If it is initialised, it simply returns 'instance', otherwise it proceeds to initialise 'instance'. In order to ensure thread safety when initialising 'instance', a lock is taken and 'instance' is again checked for null before being initialised -- this is the double-check lock pattern.
Unfortunately, this code is broken on IA64.
The line that initialises 'instance' require two memory stores. The first (in program order) is to store the value "Hello World" to the memory location represented by the 'message' field of the 'Singleton' class. The second (in program order) is to store the newly constructed instance of the 'Singleton' class to the 'instance' field of the 'Singleton' class.
However, on IA64, these two memory store operations can be reordered, resulting in the instance of the 'Singleton' class being first assigned to the 'instance' field, before "Hello World" is assigned to the 'message' field. Thus another thread could see the 'instance' field as initialised and attempt to use it. However, the 'message' field may not yet have been set, thus the program could write out 'null' to the console.
To fix this, a memory barrier is required to force the order of the two inner memory store operations (store to 'message' and store to 'instance') to always occur in the correct order. The corrected program is as follows:
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
Singleton tempInstance = new Singleton("Hello World");
Thread.MemoryBarrier();
instance = tempInstance;
}
}
return instance;
}
}
Note that NOT all implementations of the double-check lock pattern require a memory barrier. It is only because the inner initialisation code had more than one memory store operation that must occur in order that a memory barrier is required. If your implementation of the 'Singleton' class did not have a store to the 'message' field, the memory barrier would not be necessary, since there would be only one store to the 'instance' field.
For example, the following is correct, without a memory barrier, even on IA64:
public class Singleton
{
private static readonly object syncRoot = new object();
private static Singleton instance;
private Singleton()
{ }
public string Message
{
get { return "Hello World"; }
}
public static Singleton Instance
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
instance = new Singleton();
}
}
}
return instance;
}
}
class Program
{
static void Main()
{
Console.WriteLine(Singleton.Instance.Message);
}
}
You could of course change the 'instance' field to volatile and then the memory barrier would not be required. However, this incurs the overhead of doing a volatile read (read memory barrier) every time the 'Instance' property is called and therefore defeats the purpose of the performance optimisation -- although volatile is very quick, it is not as quick as just incurring the performance penalty of the System.Threading.Thread.MemoryBarrier the first time you initialise the 'instance' field.
You should always use a System.Security.SecureString when creating .NET APIs that handle passwords. Unfortunately many of the .NET Framework APIs do not yet use System.Security.SecureString. This will change in the future, but for now, how do you handle the marshaling of a System.Security.SecureString to an API that uses System.String without leaving traces of the password all over memory. Here is one work-around for this problem.
First off, what is wrong with using System.String to represent passwords:
1. The GC is free to move the System.String around in memory (unknown to the developer). When moving the memory buffer around in memory, copies of the string are left all over the now free areas of the process memory. For efficiency, the GC does not zero the memory that it frees. So even though your code may only have 'one copy' of the string, in memory there could be thousands of copies of the string that are no longer accessible by a reference or pointer and will remain there until that memory is used for something else and gets overwritten.
2. Whenever you perform some System.String manipulation, a new System.String is created. This is because System.String is immutable. Thus, if I append characters onto a System.String in order to build up a password, pieces of the password are left all over the process memory by the temporary strings that are created.
An example of a .NET Framework API that should use a System.Security.SecureString to represent a password, but instead uses a System.String, is the WCF System.ServiceModel.Security.UserNamePasswordClientCredential.Password property. There are many examples of this in the .NET Framework. One in particular that relates to a post below about the SQL OPEN { MASTER | SYMMETRIC } KEY relates to ADO.NET not accepting a System.Security.SecureString as a mapable type to a string-based parameter. This means that all passwords sent to the database via ADO.NET must be System.String.
There is some good news though, the System.Diagnostics.Process class uses a System.Security.SecureString to represent the password.
So, how do we marshal a System.Security.SecureString to a System.String temporarily, and then guarantee that traces aren't left all over the process memory. We need to implement two things:
1. Guarantee that the .NET GC does not move the System.String around in memory.
2. Guarantee a deterministic freeing of the string, zeroing out the memory.
To accomplish the first one, we will pin the string in memory using the System.Runtime.InteropServices.GCHandle class.
To accomplish the second, we will use managed pointers (unsafe code) to mutate the System.String and the strong guarantees of the System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup method.
Step 1: Capture a System.Security.SecureString password from the command line.
SecureString password = ConsoleUtility.CapturePassword("Enter password: ");
See my post below for the ConsoleUtility code. http://dotnet.org.za/markn/archive/2008/10/03/simple-console-password-capture-utility.aspx
Step 2: Allocate an "empty" System.String (filled with zeros '\0') of the correct length.
int length = password.Length;
var insecurePassword = new string('\0', length);
This creates a string filled with the '\0' character for the same length as the password. At this point, the GC is still free to move this string around in memory, leaving copies of it in memory as it goes. We are not concerned with this, because the string does not contain any sensitive information yet.
Step 3: Pin the System.String so that the GC cannot move it around in memory.
GCHandle gch = GCHandle.Alloc(insecurePassword, GCHandleType.Pinned);
Once pinned by a GC handle, the GC cannot move the memory until GCHandle.Free is called.
Step 4: Marshal the System.Security.SecureString to the pinned System.String.
1: IntPtr passwordPtr = Marshal.SecureStringToBSTR(password);
2: var pPassword = (char*)passwordPtr;
3: var pInsecurePassword = (char*)gch.AddrOfPinnedObject();
4: for (int index = 0; index < length; index++)
5: {
6: pInsecurePassword[index] = pPassword[index];
7: }
8: Marshal.ZeroFreeBSTR(passwordPtr);
Line 1 marshals the encrypted System.Security.SecureString password to an unmanaged buffer. Line 2 gets the managed pointer to the unmanaged buffer. Line 3 gets the managed pointer to the pinned managed System.String that we are going to copy the password into. Lines 4 to 7 copy the password into the System.String buffer (mutating the immutable string). Line 8 zeros and frees the unmanaged buffer holding the marshaled password.
NOTE: This code must be executed in an unsafe context. Place the unsafe keyword around the code or on the method. Enable your assembly to use unsafe code under Properties, Build, Allow unsafe code.
Step 5: Use the System.String password.
Console.WriteLine(insecurePassword);
Here we would pass the System.String to the .NET API that requires a System.String. In this case I just output the password to the console. In a real situation this would be a more useful call.
Step 6: Deterministically zero the pinned System.String password.
for (int index = 0; index < length; index++)
{
pInsecurePassword[index] = '\0';
}
This overwrites (mutates) the System.String memory with zeros '\0'.
Verify that the System.String memory has been overwritten.
Console.WriteLine(insecurePassword);
Step 7: Unpin the System.String password.
Now the System.String is free to be garbage collected by the GC, but the string does not contain any sensitive information, so we don't care that the string can now move around in memory or that there is a delay before the string is actually GC'ed.
That is the basic structure of the code. However, there are potential memory leaks in this code in the face of exceptions and asynchronous exceptions such as ThreadAbortException, OutOfMemoryException and StackOverflowException, etc.
In Step 3, we need to protect the GCHandle.Alloc to ensure that we do not leak a GC handle. An asynchronous exception, such as ThreadAbortException may be thrown after the GCHandle.Alloc method has allocated the handle, but before it is assigned to the variable 'gch', thus leaking the handle. To protect this allocation and the assignment to the variable we use a CER (Constrained Execution Region).
Step 3 changes to the following code:
1: GCHandle gch;
2: RuntimeHelpers.PrepareConstrainedRegions();
3: try {} finally
4: {
5: gch = GCHandle.Alloc(insecurePassword, GCHandleType.Pinned);