Unique and non-unique non-clustered indexes on a unique clustered index
In the last weblog post I have talked about the difference of unique and non-unique clustered indexes. As you have seen SQL Server uses an internal overhead of 4 bytes (the so-called uniquifier) to make non-unique clustered index rows unique. Today I want to work out the difference between unique and non-unique non-clustered indexes defined on a table with a unique clustered index. As you already know SQL Server creates a unique clustered index when you define the PRIMARY KEY constraint on a table. On the other hand you can use the CREATE UNIQUE CLUSTERED INDEX statement to create a unique clustered index on a table. The following listing creates our customers table, creates a unique clustered index on it, and finally creates one unique- and one non-unique non-clustered index on that table.
-- Create a table with 393 length + 7 bytes overhead = 400 bytes -- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24 CREATE TABLE Customers ( CustomerID INT NOT NULL, CustomerName CHAR(100) NOT NULL, CustomerAddress CHAR(100) NOT NULL, Comments CHAR(189) NOT NULL ) GO -- Create a unique clustered index on the previous created table CREATE UNIQUE CLUSTERED INDEX idx_Customers ON Customers(CustomerID) GO – Insert 80.000 records DECLARE @i INT = 1 WHILE (@i <= 80000) BEGIN INSERT INTO Customers VALUES ( @i, ‘CustomerName’ + CAST(@i AS CHAR), ‘CustomerAddress’ + CAST(@i AS CHAR), ‘Comments’ + CAST(@i AS CHAR) ) SET @i += 1 END GO -- Create a unique non clustered index on the clustered table CREATE UNIQUE NONCLUSTERED INDEX idx_UniqueNCI_CustomerID ON Customers(CustomerName) GO -- Create a non-unique non clustered index on the clustered table CREATE NONCLUSTERED INDEX idx_NonUniqueNCI_CustomerID ON Customers(CustomerName) GO
After the creation of both non-clustered indexes you can use the DMV sys.dm_db_index_physical_stats to get some information about the indexes. When you look into the DMV, you can see that the unique non-clustered index has a record length of 107 bytes and the non-unique non-clustered index has a record length of 111 bytes. So again, there must be a difference in the internal storage format of both indexes! Let's analyze it and start with the unique non-clustered index.
In my case the index root page of the unique non-clustered index is 4370, so I can dump it out very easily with the DBCC IND command:
DBCC PAGE(UniqueClusteredIndexStructure_NonClusteredIndex, 1, 4370, 3) GO
As you can see from the following figure each index record contains the non-clustered key (which is unique in this case) – the column CustomerName:
When you examine the byte by byte representation of the unique non-clustered index record, you can see that SQL Server uses here the following bytes:
- 1 Byte: Status Bits
- n Bytes: Unique Non-Clustered Index Key – in this case 100 bytes
- 4 Bytes: PageID
- 2 Bytes: FileID
In sum SQL Server uses the above mentioned 107 bytes per each index record on each non-leaf level of the unique non-clustered index. So again, the length of your non-clustered index key has an impact on how many rows SQL Server can store on an index page. So a CHAR(100) – like in this example – would be not a very good idea…
When you are walking down the unique non-clustered index until you reach the leaf-level of the index always stores the above mentioned 107 bytes per each index row – nothing more. When you finally dump out the leaf-level of the non-clustered index, you get the following picture:
As you can see here, SQL Server stores here at the leaf-level directly the clustered key – in our case the value of the column CustomerID. This value is for SQL Server the pointer to the corresponding record in the clustered index. With this value in the hand, SQL Server can now find the record in the clustered index – through a Clustered Index Seek operator. This is a big difference compared to non-clustered indexes defined on a heap table. Because in a heap table, SQL Server uses at the leaf-level the HEAP RID to point DIRECTLY to the corresponding data page where the record is stored. Therefore SQL Server can directly read the correct data page without accessing an additional index!
This also implies that SQL Server can find a record through a non-clustered index on a heap table faster than a record through a non-clustered index on a clustered table, because SQL Server don't have to execute the additional Clustered Index Seek operator. So the correct row can be found with less page reads on a heap table. But please don't over estimate this detail, and think that you will get a performance benefit by using non-clustered indexes on heap tables. The fact is that SQL Server always tries to store the index pages in the Buffer Manager, so it's really very cheap for SQL Server to do this additional Clustered Index Seek to get the correct record from the clustered index back.
Let's now analyze our non-unique non-clustered index. When you dump out the index root page, you can see that SQL Server stores here the non-clustered index key and also the clustered index key, which is different from the previous example with the unique non-clustered index:
SQL Server needs here the unique clustered index key to make each non-unique non-clustered index key unique. This behavior is done on EACH LEVEL of the non-unique non-clustered index, from the index root page down to the leaf-level. This means that you have a huge storage overhead, because SQL Server stores in EACH INDEX RECORD also your unique clustered key besides the non-unique non-clustered index key. So when you have a badly chosen clustered key (like CHAR(100), etc.) it will even get much more worse for you! When you analyze the index row you can see that SQL Server uses the following bytes for the storage:
- 1 Byte: Status Bits
- n Bytes: Non-unique Non-Clustered Index Key – in this case 100 bytes
- n Bytes: Unique Clustered Index Key – in this case 4 bytes for the integer value
- 4 Bytes: PageID
- 2 Bytes: FileID
When you sum up those bytes, you will get the 111 bytes mentioned earlier. So please keep this additional storage overhead in your head when using non-unique non-clustered indexes, because it impacts your non-clustered indexes on each level! You can download the T-SQL script for this posting here.
In the next installment of this series we will have finally a look into the differences of unique/non-unique non-clustered indexes defined on a non-unique clustered index. Stay tuned J