Unique and non-unique non-clustered indexes on a unique clustered index

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In the last weblog post I have talked about the difference of unique and non-unique clustered indexes. As you have seen SQL Server uses an internal overhead of 4 bytes (the so-called uniquifier) to make non-unique clustered index rows unique. Today I want to work out the difference between unique and non-unique non-clustered indexes defined on a table with a unique clustered index. As you already know SQL Server creates a unique clustered index when you define the PRIMARY KEY constraint on a table. On the other hand you can use the CREATE UNIQUE CLUSTERED INDEX statement to create a unique clustered index on a table. The following listing creates our customers table, creates a unique clustered index on it, and finally creates one unique- and one non-unique non-clustered index on that table.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes 
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24 
CREATE TABLE Customers 
( 
    CustomerID INT NOT NULL, 
    CustomerName CHAR(100) NOT NULL, 
    CustomerAddress CHAR(100) NOT NULL, 
    Comments CHAR(189) NOT NULL 
) 
GO
  
-- Create a unique clustered index on the previous created table 
CREATE UNIQUE CLUSTERED INDEX idx_Customers ON Customers(CustomerID) 
GO  
– Insert 80.000 records 
DECLARE @i INT = 1 
WHILE (@i <= 80000) 
BEGIN 
    INSERT INTO Customers VALUES 
    ( 
        @i, 
        ‘CustomerName’ + CAST(@i AS CHAR), 
        ‘CustomerAddress’ + CAST(@i AS CHAR), 
        ‘Comments’ + CAST(@i AS CHAR) 
    )     
    SET @i += 1 
END 
GO

-- Create a unique non clustered index on the clustered table 
CREATE UNIQUE NONCLUSTERED INDEX idx_UniqueNCI_CustomerID 
ON Customers(CustomerName) 
GO  

-- Create a non-unique non clustered index on the clustered table 
CREATE NONCLUSTERED INDEX idx_NonUniqueNCI_CustomerID 
ON Customers(CustomerName) 
GO

After the creation of both non-clustered indexes you can use the DMV sys.dm_db_index_physical_stats to get some information about the indexes. When you look into the DMV, you can see that the unique non-clustered index has a record length of 107 bytes and the non-unique non-clustered index has a record length of 111 bytes. So again, there must be a difference in the internal storage format of both indexes! Let’s analyze it and start with the unique non-clustered index.

In my case the index root page of the unique non-clustered index is 4370, so I can dump it out very easily with the DBCC IND command:

DBCC PAGE(UniqueClusteredIndexStructure_NonClusteredIndex, 1, 4370, 3) 
GO

As you can see from the following figure each index record contains the non-clustered key (which is unique in this case) – the column CustomerName:

image

When you examine the byte by byte representation of the unique non-clustered index record, you can see that SQL Server uses here the following bytes:

  • 1 Byte: Status Bits
  • n Bytes: Unique Non-Clustered Index Key – in this case 100 bytes
  • 4 Bytes: PageID
  • 2 Bytes: FileID

In sum SQL Server uses the above mentioned 107 bytes per each index record on each non-leaf level of the unique non-clustered index. So again, the length of your non-clustered index key has an impact on how many rows SQL Server can store on an index page. So a CHAR(100) – like in this example – would be not a very good idea…

When you are walking down the unique non-clustered index until you reach the leaf-level of the index always stores the above mentioned 107 bytes per each index row – nothing more. When you finally dump out the leaf-level of the non-clustered index, you get the following picture:

>image

As you can see here, SQL Server stores here at the leaf-level directly the clustered key – in our case the value of the column CustomerID. This value is for SQL Server the pointer to the corresponding record in the clustered index. With this value in the hand, SQL Server can now find the record in the clustered index – through a Clustered Index Seek operator. This is a big difference compared to non-clustered indexes defined on a heap table. Because in a heap table, SQL Server uses at the leaf-level the HEAP RID to point DIRECTLY to the corresponding data page where the record is stored. Therefore SQL Server can directly read the correct data page without accessing an additional index!

This also implies that SQL Server can find a record through a non-clustered index on a heap table faster than a record through a non-clustered index on a clustered table, because SQL Server don’t have to execute the additional Clustered Index Seek operator. So the correct row can be found with less page reads on a heap table. But please don’t over estimate this detail, and think that you will get a performance benefit by using non-clustered indexes on heap tables. The fact is that SQL Server always tries to store the index pages in the Buffer Manager, so it’s really very cheap for SQL Server to do this additional Clustered Index Seek to get the correct record from the clustered index back.

Let’s now analyze our non-unique non-clustered index. When you dump out the index root page, you can see that SQL Server stores here the non-clustered index key and also the clustered index key, which is different from the previous example with the unique non-clustered index:

image

SQL Server needs here the unique clustered index key to make each non-unique non-clustered index key unique. This behavior is done on EACH LEVEL of the non-unique non-clustered index, from the index root page down to the leaf-level. This means that you have a huge storage overhead, because SQL Server stores in EACH INDEX RECORD also your unique clustered key besides the non-unique non-clustered index key. So when you have a badly chosen clustered key (like CHAR(100), etc.) it will even get much more worse for you! When you analyze the index row you can see that SQL Server uses the following bytes for the storage:

  • 1 Byte: Status Bits
  • n Bytes: Non-unique Non-Clustered Index Key – in this case 100 bytes
  • n Bytes: Unique Clustered Index Key – in this case 4 bytes for the integer value
  • 4 Bytes: PageID
  • 2 Bytes: FileID

When you sum up those bytes, you will get the 111 bytes mentioned earlier. So please keep this additional storage overhead in your head when using non-unique non-clustered indexes, because it impacts your non-clustered indexes on each level! You can download the T-SQL script for this posting here.

In the next installment of this series we will have finally a look into the differences of unique/non-unique non-clustered indexes defined on a non-unique clustered index. Stay tuned J

-Klaus

Unique/Non-Unique Clustered Indexes

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In the last blog post I have talked about unique/non-unique clustered indexes on a heap table. A table without a clustered index is called a heap table in SQL Server. When you define a clustered index on such a table, the table data gets structured and is therefore referred as clustered table. In this blog post I want to talk about the differences in unique and non-unique clustered indexes, and what are the storage impacts between those 2 types of clustered indexes.

As a prerequisite I assume that you have a basic understanding of clustered indexes, and that you know the difference between heap and clustered tables, and how your data pages are structured when a clustered index is defined on a table.

Let’s start by looking on a unique clustered index. With SQL Server you have several possibilities to define a unique clustered index. The first way – the easy one – is to define a PRIMARY KEY constraint on a column. SQL Server enforces this PRIMARY KEY constraint through the creation of a unique clustered index on that table and that column. The another option is to create a unique clustered index through the CREATE CLUSTERED INDEX statement – but when you don’t specify the UNIQUE property, SQL Server will create a non-unique clustered index by default for you! The following code fragment creates the Customers table that you already know from the previous blog posting, but this time we create a PRIMARY KEY constraint on the column CustomerID. Therefore SQL Server creates a unique clustered index on that table and sorts the data pages in the leaf level according the values in the column CustomerID.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
   CustomerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1),
   CustomerName CHAR(100) NOT NULL,
   CustomerAddress CHAR(100) NOT NULL,
   Comments CHAR(189) NOT NULL
)
GO

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 80000)
BEGIN
   INSERT INTO Customers VALUES
   (
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )

   SET @i += 1
END
GO

After we have identified the index root page (through the use of the DBCC IND command), we can dump out that page with the DBCC PAGE command. In my case the index root page is 775:

DBCC PAGE(UniqueClusteredIndexStructure, 1, 775, 3)
GO

As you can see from the following figure each index record contains the clustered key, in this case the value of the column CustomerID.

image

When you examine the byte by byte representation of a clustered index record, you can see that SQL Server uses here the following bytes:

  • 1 byte: Status Bits
  • n bytes: Clustered Key – in this case 4 bytes
  • 4 bytes: PageID
  • 2 bytes: FileID

As you can see the length of the clustered key has a direct relationship of the length of an index record. This mean as smaller your clustered key is, the more index record can be put onto an index page, and therefore your clustered index will be much more compact and will perform faster and are easier to maintain. When you walk down your clustered index you will see that all intermediate levels have the same storage format as described above. There are no differences on each level, expect the index leaf level, because this level contains your actual logically ordered data pages.

Let’s have now a look onto non-unique clustered indexes in SQL Server and how they differ from unique clustered indexes. To demonstrate this kind of indexes, I have just recreated the Customers table and created a non-unique clustered index on that table through the CREATE CLUSTERED INDEX statement:

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
( 
   CustomerID INT NOT NULL,
   CustomerName CHAR(100) NOT NULL,
   CustomerAddress CHAR(100) NOT NULL,
   Comments CHAR(181) NOT NULL
)
GO

-- Create a non unique clustered index
CREATE CLUSTERED INDEX idx_Customers_CustomerID
ON Customers(CustomerID)
GO

Finally I have inserted 80.000 records, where the column CustomerID (the clustered key) is not unique anymore:

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 20000)
BEGIN
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
 
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )

SET @i += 1
END
GO

When you now dump out the root index page of the non-unique clustered index, you get the following result:

image

As you can see, SQL Server returns here an additional column named UNIQUIFIER (key). This column is used by SQL Server to make a non-unique clustered key unique. Behind the scenes it is a 4 byte long integer value starting at 0. E.g. when you have 2 customers with the ID 1380 the first record gets the uniquifier value 0 and the second one gets the uniquifier value of 1. But SQL Server only stores the uniquifier in the navigation structure of an index (all levels above the leaf level), when the uniquifier is not equal to 0. SQL Server only includes uniquifier values of 0 in the navigation structure of a non-unique clustered index, which means that the navigation structure will never store the uniquifier physically. The only place where the uniquifier is stored in a non-unique clustered index is on the data pages, where the actual data records are stored. The following figure shows a data page dump of our clustered index, where you can also see the stored uniquifier.

image

So the only difference between a unique and non-unique clustered index is on the data pages, because when using a non-unique clustered index, SQL Server will use the 4 byte long uniquifier to make them unique, which is a small storage overhead that you have to keep in mind, when working with non-unique clustered indexes. You can download the T-SQL script for this posting here.

In the next posting we will work out the differences between unique/non-unique non-clustered indexes defined on unique clustered indexes. Stay tuned :-)

-Klaus

Unique and non-unique SQL Server indexes on a heap table

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In the upcoming weblog postings I want to work out the differences between unique and non-unique indexes in SQL Server. I assume that you already know the concepts about clustered- and non clustered indexes and how they are used in SQL Server.

In the past I’ve done a lot of trainings and consulting regarding SQL Server performance tuning and it seems that some people doesn’t know the differences and implications between unique and non-unique indexes. And as you will see in the upcoming postings there are really big differences how SQL Server stores those two variants that impact the size and the efficiency of your indexes.

Let’s start today with unique and non unique non clustered indexes on a table without a clustered index, a so-called heap table in SQL Server. The following listing shows how to create our test table and populate it with 80.000 records. Each record needs 400 bytes, therefore SQL Server can put 20 records on each data page. This means that our heap table contains 4.000 data pages and 1 IAM page.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes 
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24 
CREATE TABLE CustomersHeap 
( 
    CustomerID INT NOT NULL, 
    CustomerName CHAR(100) NOT NULL, 
    CustomerAddress CHAR(100) NOT NULL, 
    Comments CHAR(189) NOT NULL 
) 
GO

-- Insert 80.000 records 
DECLARE @i INT = 1 
WHILE (@i &lt;= 80000) 
BEGIN 
    INSERT INTO CustomersHeap VALUES 
    ( 
        @i, 
        ‘CustomerName’ + CAST(@i AS CHAR), 
        ‘CustomerAddress’ + CAST(@i AS CHAR), 
        ‘Comments’ + CAST(@i AS CHAR) 
    )     
    SET @i += 1 
END 
GO

-- Retrieve physical information about the heap table 
SELECT * FROM sys.dm_db_index_physical_stats 
( 
    DB_ID(‘NonClusteredIndexStructureHeap’), 
    OBJECT_ID(‘CustomersHeap’), 
    NULL, 
    NULL, 
    ‘DETAILED’ 
) 
GO

After the creation of the heap table and the data loading, you can now define a unique and non-unique non-clustered index on the column CustomerID of our heap table. We will define both indexes on the same column so that we can analyze the differences between unique- and non-unique non-clustered indexes.

-- Create a unique non clustered index 
CREATE UNIQUE NONCLUSTERED INDEX IDX_UniqueNCI_CustomerID 
ON CustomersHeap(CustomerID) 
GO  

-- Create a non-unique non clustered index 
CREATE NONCLUSTERED INDEX IDX_NonUniqueNCI_CustomerID 
ON CustomersHeap(CustomerID) 
GO 

If you want to define a unique non-clustered index on a column that doesn’t contain unique data, you will get back an error message from SQL Server. Important to know is that SQL Server creates a non-unique non-clustered index if you don’t specify the UNIQUE property when creating a non-clustered index. So by default you will always get a non-unique non-clustered index!

After the creation of both indexes you can analyze their size, their index depth, their size etc. with the DMV sys.dm_db_index_physical_stats. You can to pass in as the 3rd parameter the index-id. The IDs of all non-clustered indexes starts at 2, therefore the first non-clustered index gets the ID 2 and the second one the ID 3.

-- Retrieve physical information about the unique non-clustered index 
SELECT * FROM sys.dm_db_index_physical_stats 
( 
    DB_ID(‘NonClusteredIndexStructureHeap’), 
    OBJECT_ID(‘CustomersHeap’), 
    2, 
    NULL, 
    ‘DETAILED’ 
) 
GO

-- Retrieve physical information about the non-unique non-clustered index 
SELECT * FROM sys.dm_db_index_physical_stats 
( 
    DB_ID(‘NonClusteredIndexStructureHeap’), 
    OBJECT_ID(‘CustomersHeap’), 
    3, 
    NULL, 
    ‘DETAILED’ 
) 
GO

As you can see from both outputs, the index root page of the unique non-clustered index is occupied of around 24%, where the index root page of the non-unique non-clustered index is occupied of around 39%, so there must be a difference in the storage format of unique/non-unique non-clustered indexes on a heap table! In the next step we create a simple helper table that stores the output of the DBCC IND command. The structure of this helper table is directly taken from the excellent book SQL Server 2008 Internals.

-- Create a helper table 
CREATE TABLE sp_table_pages 
( 
   PageFID TINYINT, 
   PagePID INT, 
   IAMFID TINYINT, 
   IAMPID INT, 
   ObjectID INT, 
   IndexID TINYINT, 
   PartitionNumber TINYINT, 
   PartitionID BIGINT, 
   iam_chain_type VARCHAR(30), 
   PageType TINYINT, 
   IndexLevel TINYINT, 
   NextPageFID TINYINT, 
   NextPagePID INT, 
   PrevPageFID TINYINT, 
   PrevPagePID INT, 
   PRIMARY KEY (PageFID, PagePID) 
) 
GO

After the creation of this helper table we can dump out all pages that are belonging to our non-clustered indexes to this helper table with the following two calls to DBCC INC in combination with the INSERT INTO statement:

-- Write everything in a table for further analysis 
INSERT INTO sp_table_pages 
EXEC(‘DBCC IND(NonClusteredIndexStructureHeap, CustomersHeap, 2)’) 
GO

-- Write everything in a table for further analysis 
INSERT INTO sp_table_pages 
EXEC(‘DBCC IND(NonClusteredIndexStructureHeap, CustomersHeap, 3)’) 
GO

Now we can start analyzing our non-clustered indexes by using the undocumented DBCC PAGE command. You can find more information about this great command on Paul Randal’s weblog. To get some information back from DBCC PAGE you have to enable the flag 3604 of DBCC:

DBCC TRACEON(3604) 
GO

Let’s dump out the index root page of our unique non-clustered index by the following command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 3) 
GO

This will result in the following result in SQL Server Management Studio:

image

As you can see from this screenshot SQL Server stores the child page of the B-tree where the minimum key of the non-clustered index is located. The child page 4161 contains for example the record with the minimum key of 540 up to the maximum key of 1078. When you dump out the index root page with the dump option 1 you get the byte by byte representation of all index records on the index root page:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4192, 1) 
GO

SQL Server needs here 11 bytes for storing an index row. These 11 bytes are storing the following information:

  • 1 byte: Status Bits
  • 4 bytes: Customer ID, like 540
  • 4 bytes: child PageID, like 4161
  • 2 bytes: FileID, like 1

As you can see it’s up to the length of the non-clustered key how long an index row is. This also means that SQL Server is able to store more index rows on an index page if you choose a smaller non-clustered key. If you choose for example a CHAR(100) as a non-clustered index key, then SQL Server needs more index pages for your non-clustered index, which is not so efficient as using a smaller index key. The T-SQL script enclosed to this posting shows you how you can decode those bytes from the hexadecimal representation.

Finally you can dump out the child page 4161, which is located on the leaf-level of the non-clustered index.

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4161, 3) 
GO

image

As you can see from the figure, SQL Server stores for each index key on which data page and on which slot the corresponding record is located. Because we have not defined a clustered index on our table, SQL Server uses here the RID (Row Identifier) to point to the correct record on the data page. Index pages on the leaf-level on a heap table are different from leaf-level index pages defined on a clustered table (a table that contains a clustered index).When you dump out the leaf-level index page of the non-clustered index you can see that SQL Server needs 13 bytes per index row:

  • 1 byte: Status Bits
  • 4 bytes: CustomerID, like 540
  • 4 bytes: PageID, like 178,
  • 2 bytes: FileID, like 1
  • 2 bytes: Slot number, like 19

Finally with this information in your hand, it is very easy to locate the correct record on the data page, because you know the PageID, FileID, and also the slot number where the record on the data page is located. Easy, isn’t it?

Let’s move on now to non-unique non-clustered indexes. Earlier we have already created such an index, which gets the index-id of 3 from SQL Server, because it’s the second non-clustered index we have defined. In my case the index root page of the non-unique non-clustered index is located on page 4264, therefore I dump it out with the following command:

DBCC PAGE(NonClusteredIndexStructureHeap, 1, 4264, 3) 
GO

image

But wait! Now the result from DBCC PAGE on the root index page on a non-unique non-clustered index is different! As you can see SQL Server returns here an additional column named “HEAP RID (key)“. The value in this column is used to make your non-unique non-clustered index unique. The HEAP RID column uses 8 additional bytes in your index row, which encodes the following information that are granted to be unique on a heap table:

  • 4 bytes: PageID, like 178
  • 2 bytes: FileID, like 1
  • 2 bytes: Slot number, like 19

The overead of a non-unique non-clustered index on a heap table costs you 8 additional bytes per index row – on all index levels, expect the leaf-level, because SQL Server stores here always the HEAP RID as you have seen previously! So please keep this 8 bytes of additional index record overhead in mind, when you create non-clustered indexed that are NOT unique! And as I have said earlier, they are NOT unique by default!!!

In this example your non-unique non-clustered index is about 2 times bigger than the unique non-clustered index, because the unique index needs 11 bytes and the non-unique index needs 19 bytes (overhead of 8 bytes). When you look back to the output of the DMV sys.dm_db_index_physical_stats you can see that the index root page of the unique non-clustered index has a page space usage of around 24% where the index root page of the non-unique non-clustered index has a page space usage of around 39%. This will make a big difference on large non-clustered indexes!

image

So if you are just defining non-clustered indexes with

CREATE NONCLUSTERED INDEX …

without thinking about the uniqueness of your data, you are wasting a lot of storage in your non-clustered indexes which also impacts the performance of your non-clustered indexes and their ongoing maintenance.
You can download the T-SQL script for this posting here.

In the next installment of this series we will have a look into the differences of unique clustered indexes and unique/non unique non-clustered indexes. Stay tuned :-)

-Klaus