In today’s blog posting I want to talk about a very important topic in SQL Server that always confuses people when they first work with SQL Server. It is about the difference between the Primary Key constraint and the Clustered Index.
What is a Primary Key?
Let’s talk first about the Primary Key constraint itself. As the name implies it is just a constraint, and with that constraint you tell SQL Server that you want to have unique values in a specific column or in a specific group of columns. The following listing shows a very simple table definition where the Primary Key constraint is specified on the first column Col1.
CREATE TABLE Foo ( Col1 INT NOT NULL PRIMARY KEY, Col2 INT NOT NULL, Col3 INT NOT NULL ) GO
Now when you insert records into that table, SQL Server makes sure that you always have unique values in the column Col1. If you try to insert a duplicate value, SQL Server returns error message.
-- Try to insert a duplicate value INSERT INTO Foo Values (1, 1, 1), (1, 2, 2) GO
Msg 2627, Level 14, State 1, Line 9 Violation of PRIMARY KEY constraint ‘PK__Foo__A259EE544224D12A’. Cannot insert duplicate key in object ‘dbo.Foo’. The duplicate key value is (1). The statement has been terminated.
The Primary Key constraint itself is defined on a logical level – you just tell SQL Server that you want to have unique values in a specific column. But SQL Server also has to enforce that uniqueness on the physical level – in the data structures where you store your table data. In the case of SQL Server, the uniqueness on the physical level is enforced with an index structure – with a Clustered Index or Non-Clustered Index. Let’s have a more detailed look at this.
Enforcement of the Primary Key constraint
When you specify the Primary Key constraint, SQL Server enforces it by default with a Unique Clustered Index on the physical level. When you look at sys.indexes, you can see that under the covers SQL Server has generated a Unique Clustered Index that is used to enforce the Primary Key constraint.
-- SQL Server generates by default a Unique Clustered Index SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID('Foo') GO
As I have said, the Unique Clustered Index is created by default. You can also enforce a Primary Key constraint with a Unique Non-Clustered Index as shown in the following listing.
-- Enforces the Primary Key constraint with a Unique Non-Clustered Index CREATE TABLE Foo1 ( Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED, Col2 INT NOT NULL, Col3 INT NOT NULL ) GO
When you specify the Primary Key constraint, you can specify the following 2 options:
The option CLUSTERED is the default one, and therefore you don’t have to specify it. When you look now again at sys.indexes, you can see now that you have a heap table in front of you (a table without a Clustered Index), and that SQL Server has generated an additional Unique Non-Clustered Index to enforce the Primary Key constraint.
-- SQL Server has generated now a Unique Non-Clustered Index to -- enforce the Primary Key constraint SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID('Foo1') GO
Primary Key <> Clustered Index
Therefore it doesn’t mean that a Primary Key and a Clustered Index is always the same in SQL Server. It is the same by default, but you can change this behavior if you want. The Primary Key constraint is always on the logical level, and the index structure is on the physical level to enforce the constraint itself.
The question now is when does it make sense to enforce the Primary Key constraint with a Unique Non-Clustered Index? Back in April 2014 I wrote about a problem in SQL Server that is called Last Page Insert Latch Contention: an ever-increasing Clustered Key column like an INT IDENTITY column doesn’t really scale in SQL Server.
If you hit that specific problem, it *could* make more sense to physically cluster/sort your table data on a random value – like a UNIQUEIDENTIFIER column. In that case you can still use the Primary Key constraint on the original ever-increasing key column, but enforce it with a Unique Non-Clustered Index, and cluster your table on a random key column. The following listing shows this approach.
-- Create the Primary Key constraint on an ever-increasing -- key column CREATE TABLE Foo2 ( Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED, Col2 UNIQUEIDENTIFIER NOT NULL, Col3 INT NOT NULL ) GO -- Create the Clustered Index on a random key column CREATE UNIQUE CLUSTERED INDEX ci_Col2 ON Foo2(Col2) GO
When you look again at sys.indexes, you can see now that you have created a Clustered and Non-Clustered Index. But only the Non-Clustered Index is used to enforce the Primary Key constraint.
A Primary Key constraint and a Clustered Index is not really the same in SQL Server. By default SQL Server enforces the Primary Key constraint with a Unique Clustered Index. But if you want you could instead enforce it with a Unique Non-Clustered Index. But that approach wouldn’t really make sense as a default, because you need to have specific problems to justify that approach.
Thanks for your time,
14 thoughts on “Primary Key vs. Clustered Index”
Great post! It is good to see people addressing this issue in SQL Server, especially for those who come from another RDBMS technology where SQL Server’s treatment of PKs/Clustered Indexing is somewhat unique.
As you make good reference to, a primary key is enforced through an index which may be clustered or non-clustered, it doesn’t really matter. What does matter is that a table may only have one clustered index (as this defines the physical order of the table). I wonder if it is worth more explicit treatment that the primary key will only default to a clustered index if another clustered index does not already exist?
I’ve used this diagram I created to help show the distinction between constraints and indexes:
That’s great – nice 🙂
Thanks Klaus and other commentors – this can be a confusing thing that quickly slips into the “it is probably not too important” crack where it can lay dormant for years until it awakes to bite you where you least expected it!
Ultimately I think the SQL Server terminology was a bad choice since that is where the confusion starts, plus enforcing a default behaviour of clustered index.
I’ve written a bit more about it here: http://goodfast.info/now-make-it-fast
Very nice article.
Another great Article.
Thanks Klaus for interesting article. Why do we still need
Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED
if Col2 becomes unique and clustered?
Thanks for your comment.
You still want to enforce the primary key, but this time with a Non-Clustered Index.
Excellent Sir ,I have not seen any better explanation of Primary and cluster Index before
Thanks for the great feedback, sir! 🙂
Nice post thanks for sharing!
Why would you ever want to cluster on a random value like a unique identifier? What makes more sense is to have the primary key a unique identifier and cluster on an incrementing value. Like you said, the clustered index determines the physical structure.
Does anyone test the performance primary key clustered with nonclustered when inserting
I don’t really understand the point of having a non-clustered incrementing primary key. Why not make the unique identifier the primary key? Now every time you try to get rows by the primary key, it needs to do another index lookup to get the clustered index value first.
I also don’t see how your described solution solves the performance problem, given that the last page of the [incrementing primary key, unique identifier] index will still experience the locking you describe on insertions since it is ordered by the primary key.