The mystery of the NULL bitmap mask

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s weblog post I want to talk about the NULL bitmap mask and some mysteries that you can encounter with it. The NULL bitmap mask encodes at the Storage Engine level which column stores a NULL value and which not. It’s a simple bitmap mask, where a set bit (1) means that the column has a NULL value, and a not set bit (0) means that the column has an actual value (not a NULL value).

At the first look this sounds very straightforward, but when you look into the details, there are some implications with this approach. Let’s have in the first step a look into the concrete structure of a data row that is used by the Storage Engine to store physically a record on a disk. The following picture shows that structure:

This format is called the FixedVar format, because SQL Server always stores fixed length columns (like INT, CHAR) before variable length columns (like VARCHAR) are stored. As you can see SQL Server stores at the beginning 2 bytes with some status bits, and then in 2 bytes the offset to which the fixed portion of the record is stored. Afterwards you find 2 bytes that stores the column count, which is followed by the actual NULL bitmap mask.

When you look at this in the first step, everything makes sense to you, but after a second look, you start thinking and you may ask, WHY SQL Server stores the actual column count in EVERY data row??? The column count MUST be identical for every data row! Why SQL Server actually stores redundant data?

The first (short) answer is very easy and logical: SQL Server needs the column count to calculate how many bytes are used for the actual NULL bitmap mask. Up to 8 columns in a table needs 1 byte, 9 – 16 columns need 2 bytes, 17 – 24 columns need 3 bytes, and so on. Makes sense, but again: the column count for each record in a table MUST BE THE SAME!

Let’s have now a more detailed look into the technical correct answer, along with some examples. First of all, you need to know that the NULL bitmap mask stores the number of columns that are CURRENTLY stored in the record at the Storage Engine level. This means that SQL Server can store a different amount of columns in a physical data row – ok, now it gets really confusing… So the column count in the physical data row does not have to match with the column count in the table metadata layer (sys.columns). Those are also really different layers inside the architecture of SQL Server.

So under which circumstances those layers are not matching to each other? It’s very simple: when you are adding columns to an existing table! SQL Server makes a big difference if you add a NULL or NOT NULL column to a table. When you just add a new NULL column to a table, SQL Server only updates the table metadata layer, WITHOUT touching the Storage Engine layer. This means that none of your records gets physically changed, when you add a NULL column. On the other hand, when you add a NOT NULL column, SQL Server updates the table metadata layer, and ALSO the Storage Engine layer, which means that SQL Server has to touch and rewrite EACH of the records in the table, where you have added the NOT NULL column. This makes a huge performance different! For that reason SQL Server has to store the actual column count in each data record, because the column count must not be in synch with the table metadata layer.

Let’s have now a more detailed look into a concrete example. For this example I have created a simple table with 8 columns, which means that SQL Server uses 1 byte for the NULL bitmap mask:

Afterwards I have inserted 2 records into the previous created table:

When you dump out both data pages through the DBCC PAGE command, you can see that each record has a length of 4219 bytes (4204 bytes data + 7 bytes row overhead + 2 bytes variable length column count + 3 x 2 bytes variable column offset array entry).

Now just add a new NULL column to the existing table:

This is the 9th column in the table, which means SQL Server needs now 2 bytes for the column count. But SQL Server DOES NOT changes the physical data row at the Storage Engine level, because you are just adding a NULL column. SQL Server doesn’t have to do anything on the Storage Engine level. You can prove that just by dumping out the 2 data pages again:

The records have the same length of 4219 bytes, but you have added a column (logically) to the table. Let’s try now to update one record of the table, so that the previous added column gets an actual value:

When you dump out the data page where the 2nd record of the table is stored, the record size is the old one of 4219! You have now created a scenario where SQL Server stores a different length of the NULL bitmap mask inside the data row. This also means that you can have scenarios where a table with only fixed length columns has different row sizes on the Storage Engine level – sounds interesting, isn’t it? J

Let’s now drop the table, recreate it, and insert the 2 records into it. Now we are adding a NOT NULL column to the table:

Now, SQL Server has to change EVERY record on the Storage Engine Level, because the default value of the new column must be added (you must define a default value for the new column when you already store records in a table), and SQL Server also has to expand the NULL bitmap mask.

This phenomenon can lead to serious performance problems when you are dealing with large tables, and you want to add a NOT NULL column. To give you an idea about the performance degradation I have inserted 1 million records into that table. When I have added a NULL column, SQL Server needed a few milliseconds, because it was just a metadata operation. But when I have added to that table a NOT NULL column, it took around 40 seconds (!) until the ALTER TABLE was completed. So as you can see this is a serious performance degradation when dealing with adding NOT NULL columns to big tables!

I hope that you now understand why SQL Server has to store the actual column count in each record on the Storage Engine level, and that you can have serious performance problems when you are adding NOT NULL columns to large tables in SQL Server.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Replaying workloads with the RML Utilities

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s weblog posting I want to talk about replaying SQL Server workloads with the RML Utilities. RML Utilities stands for Replay Markup Language Utilities and are provided by Microsoft as a free download. You can download them here: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7EDFA95A-A32F-440F-A3A8-5160C8DBE926. I have first encountered the RML Utilities when I read the book “Professional SQL Server 2008 Internals and Troubleshooting” (ISBN 978-0470484289) written by Christian Bolton, Justin Langford, Brent Ozar, James Rowland-Jones, and Steven Wort.

The RML Utilities contains a replaying engine that replays SQL Server workloads in real-time. The above mentioned book states the following on page 462 about those tools: “RML Utilities allow you to replay traces at the same rate they were captured. That means you get a better representation of true user activity and server workload than you do with other tools that condense the statements and execute them in sequence“.

That is also the real advantage of the RML Utilities: Replaying traces at the same rate they were captured! I have several clients that have really hard problems when they want to replay workloads, e.g. in a test environment when they have applied new indexes for performance optimization. SQL Server Profiler just replays workloads, but you don’t get the identical workload at the same rate replayed, so you are not able to reproduce some scenarios like locking/blocking.

For a recent customer engagement I’ve prepared a demo where I wanted to show how you can replay a workload that was generated from a .NET application. This sounds very easy, especially when you have read chapter 12 of the above mentioned book, but there are really some awful pitfalls that you have to know when you are working with the RML Utilities and SQL Server workloads generated by .NET applications. With this weblog posting I want to show you those pitfalls and what workarounds you have to do to get everything working fine.

Let’s start by the configuration needed to capture a SQL Trace for the RML Utilities. Everything that you have to do is to setup a server-side SQL Trace and capture everything in a .TRC file. The biggest problem here is the fact that you have to capture a lot of different events, which means that your .TRC files are getting really large. I’ve seen traces with several 100 GB of data in it. The RML Utilities are coming with a SQL script file that you can use as a basis for capturing a SQL Trace (TraceCaptureDef.sql):

As you can see from the previous listing, you have to collect a lot of different events from SQL Trace that are needed for the replaying functionality. If you don’t capture them, you are not able to replay the captured workload. A good practice here is to use rollover files, to keep the size of the .TRC files manageable.

After you have configured and started your SQL Trace, you can use your .NET application that interacts with SQL Server. As soon as you have captured the relevant workload you can stop and delete the SQL Trace from SQL Server. Inside your file system you have now your .TRC files with the captured workload.

And now the real fun begins. If you follow the descriptions from chapter 14, you will fail immediately in replaying the captured workload. The problem is that the RML Utilities have several bugs/problems, when they have to replay workloads that were captured from .NET applications that are connecting to SQL Server through MARS (Multiple Active Result Sets). I have encountered those problems, because for my demonstration I have used a .NET application (which uses MARS) that I have written in the year 2006. I just wanted to be realistic as possible J. All those problems are described in the following weblog posting from the CSS SQL Server Engineers: http://blogs.msdn.com/b/psssql/archive/2009/01/21/prb-rml-utilities-readtrace-and-how-to-workaround-mars.aspx.

The first thing that you have to do with your SQL Trace workload is to load it into a trace table in a SQL Server database through the following command:

Please keep in mind that this process could also take some time when you are working with large .TRC files. After you have loaded the trace to your trace table, you have to change it as described in the above mentioned weblog posting:

Finally you can open the trace table in SQL Server Profiler and export everything to a new .TRC file in the file system. Again this could take a large amount of time when you are working with a large trace table.

In the next step you can now convert the changed .TRC file to RML files that can be replayed against a SQL Server instance. For that purpose you can use the readtrace.exe utility along with the trace flag T28. See the following command line:

readtrace.exe -Id:\MyNewTrace.trc -od:\PerfTesting_Output -T28

The problem that I encountered with this approach is the fact that I get RML files that can’t be replayed against SQL Server. Here is the reason: each RML file includes the SPID in the file name, and I got RML files like “SQL33333.rml” – and 33333 is an invalid SPID in SQL Server, because SQL Server can “only” handle 32767 connections. For that reason I have changed the file names to SPIDs that are supported by SQL Server. If you have a large amount of RML files, it would also make sense to write a simple program which renames the RML files to valid file names.

After I have changed each file name to a valid SPID, you are now able to replay the workload with the ostress.exe utility:

ostress.exe -id:\PerfTesting_Output\*.rml -mreplay -csample.ini

As you can see you have to do a lot of things when you want to replay workloads from .NET applications against SQL Server that are using MARS. With this approach it is very easy to check in a test environment what the impacts of some configuration changes are or how your application reacts when you are adding or deleting indexes.

-Klaus

Do you like this post? Why not share it with the SQL Server community?

My SQLPASS session

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Yesterday we got confirmation from SQLPASS if our submitted sessions for the upcoming SQLPASSS summit in Seattle (October 11 – 14) were accepted. I’m very lucky to announce that my submitted session “Advanced SQL Server 2008 Troubleshooting” made it to the summit as a regular session (75min)! Smile Here’s the abstract that I have submitted:

It’s Friday, 05:00pm. You are just receiving an email that informs you that your SQL Server has enormous performance problems! What can you do? How can you identify the problem and resolve it fast? Which tools provides you SQL Server for this task? In this session you will see a lot of different tools with which you can do effective SQL Server troubleshooting. You will learn how to identify performance bottlenecks with those tools, and how to resolve the identified bottlenecks.

I’m really proud of presenting this session, because all my work from the past and from the future is about SQL Server performance tuning and troubleshooting. I’ve also presented a 1-day workshop at the previous SQLbits conference about this topic, where I got really good and promising feedback. See here for some attendee voices. I’m also organizing several 3-day long workshops across Europe about this topic. So, if you are not able to attend the SQLPASS summit, but you are interested in that topic, you can also join one of my scheduled workshops here in Europe – see http://www.csharp.at/events.html for further information.

See you soon in Seattle!

-Klaus

Do you like this post? Why not share it with the SQL Server community?