Managing your SQL Log files
Part 2, covering how the transaction log is affected by the recovery model and how it affects data loss, is now out
OK, everyone who has put some thought into the setup of their transaction log
file, please raise your hand...

...I thought so.
The sad truth is that there are very few people who think through the log
files for their production systems - often with dramatic results. Each and
every single SQL Server database has both a data file and a log file, both of
which serve an important function. The data file stores the actual data in the
database whilst the log file stores all changes made to the data stored in the
data file. Why? What is the purpose of storing (basically) the same
information twice?
why do we need a log file?
Log files help satisfy the ACID requirements of RDBMS's. Just
about every command that gets issued against your database is stored in the log
file (there are some exceptions). Some clarification: Every command you issue
stores the following in the log file:
- The start time for the transaction
- Before and after images of the data
- Whether each transaction was committed or not
- All allocation changes to database pages
- As well as a few other things...
Why? I mean, what is the point? OK, example time. You are a DBA responsible
for maintaining a line of business database. Being the conscientious DBA you
are, you have scheduled regular full backups - twice a day in fact! Now, image
a situation where 3 minutes and 12 seconds before a backup is scheduled to run
some twit of a developer forgets to put a where clause in his update statement
(or you get a torn page or something, anything bad). What do you do?
I'm going to assume that the developers transaction has been committed.
First and foremost, you decide to restore your last full backup... from a full
half day ago. Doing well so far, but where do you get the rest of the days
transactions from? The transaction log file, of course! After you've restored
your latest backup you can roll the transaction log forward to recover the rest
of your transactions meaning that even though your most up to date backup was
out of date, you haven't lost anything. True story!
how does it work?
The reason this works is very simple. SQL Server does not commit any changes
to the data file unless they have been committed to the log file first. This
ensures that no matter what happens, no data is written to disk without there
being a record in the transaction log. This is known as a Write Ahead Log
(WAL). The process behind WALing works as follows: SQL maintains a cache of
data pages which are to be read or updated. Whenever data is modified, the
cache is modified NOT the physical data file. The modification is only written
to the physical file once a checkpoint is reached or SQL needs to use the cache
for something else. Every modification results in an entry in the transaction
log. The transaction log entry is written to disk BEFORE the data
modification. If it were to happen the other way around, the data modification
could create a transaction which you could not roll back from.
In this way, you have a guaranteed source of data as well as an exact plan of
how to get there unless you mess your backups and checkpoints up... which brings
us to the next point:
"my log file is too large!"
The most common complaint regarding log files is "My log file is too large!"
Granted, this is a problem, and I'm fairly sure many of you have already
experience this problem, but I'm willing to bet that this has become a problem
due to the way you've decided to manage your log file. If you've left your
database setup to someone who doesn't understand how log files work you are
almost guaranteed to have this problem.
As mentioned before, log files are used to store a history of all
transactions that have been performed against your database. Straight away you
would think this would cause a problem. A record of every single transaction?
How is that NOT going to get extremely large? Valid question, and the
answer is in how the log file works. Every time you make a backup of your
database you are invalidating the need for a record of all the transactions
before the backup. Granted, you can't rollback transactions anymore (not
entirely true) but you have a copy of the data somewhere other than in the data
file.
In order to help SQL Server maintain the transactions which can have their
logs discarded and the transactions which can't, the transaction log file is
split into an Active and a Inactive portion. The active portion cannot be
altered as this is the portion of the log file which is used to roll-forward
transactions that were not saved at the last checkpoint. SQL Server keeps a
pointer to the start of the active portion using a MinLSN (Minimum recovery Log
Sequence Number). This is a pointer to a transaction in the log file (not
necessarily the start) which tells SQL Server that everything after the pointer
is an Active Transaction, everything before the MinLSN has been written to disk and can be altered.
The MinLSN gets moved when one of the following occurs:
- A BACKUP LOG command completes
- A CHECKPOINT command completes (only in Simple recover model) - either by
using the CHECKPOINT command, or by the system issuing a checkpoint.
CHECKPOINT's are cause by the following:
- Executing the CHECKPOINT command
- A transaction not logged completely is executed (Bulk inserts, Select
Into's...)
- A database backup is taken
- The database is shutdown
- Database files are altered in some way (ALTER DATABASE commands)
- The database issues checkpoints based on the Recover Interval (SQL estimates
how many logs can be processed during the Recovery Interval)
- In Simple Recovery mode, when the log file becomes 70% full
Don't expect your log file to shrink in size simply because you issue a
CHECKPOINT command even if you are in simple recovery mode, if you have any long running transactions SQL cannot commit
it and therefore cannot update the MinLSN.
OK, so we know that log files help ensure that we have an up-to-date copy of
the data at all times and we know how it does this. We also know that we need
to be careful about choosing our recovery model and our backup procedures, but
we don't have any guidelines yet as to how to do this. To that end, next time
we'll be taking a look at the various recovery models and the way they affect
the growth of the transaction log. Until then though, take a look at your log
files and the general sizes they go through during the day. See if you can get
it down to a relatively consistent size regardless of the type of transactions
being issued against it.