I’m a lazy person. I just want to do as little work as possible. I don’t want to think too much when I work. Yes, I know that sounds very terrible and will probably disqualifies me as a SQL Server consultant, but in today’s blog posting I want to show you how you can delegate the working and thinking process to the Query Optimizer when you want to create an indexing strategy for a specific query. Sounds interesting? If yes, then enter my world of index tuning 😉
The problematic query
Let’s have a look at the following query:
DECLARE @i INT = 999 SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, LineTotal FROM Sales.SalesOrderDetail WHERE ProductID < @i ORDER BY CarrierTrackingNumber GO
As you can see here I use a local variable in combination with an inequality predicate to retrieve some records from the table Sales.SalesOrderDetail. When you run that query and you look at the execution plan, you can see some serious problems with it.
- SQL Server has to scan the complete Clustered Index of the table Sales.SalesOrderDetail, because there is no supporting Non-Clustered Index. The query needs 1382 logical reads for this scan, and the elapsed time is around 800ms.
- The Query Optimizer introduced an explicit Filter operator in the query plan, which does a row-by-row comparison to check for qualifying rows (ProductID < @i)
- Because of the ORDER BY CarrierTrackingNumber, an explicit Sort operator is introduced in the execution plan.
- The Sort operator is spilled over to TempDb, because of the inaccurate Cardinality Estimation. With an inequality predicate in combination with local variables, SQL Server estimates hard-coded 30% of the rows from the base cardinality of the table. In our case the estimation is 36395 rows (121317 * 30%). In reality the query returns 120621 rows, which means that the Sort operator has to spill over to TempDb because the requested memory grant is just too small.
And now I ask you - how can you improve that query? What are your suggestions? Just take a break and think for a few minutes. How can you improve that query without changing the query itself?
Let’s tune the query!
Of course we have to work on our indexing strategy to make an improvement. Without a supporting Non-Clustered Index that’s the only plan that the Query Optimizer can use to run our query. But what is a good Non-Clustered Index for this specific query? Normally I will always start thinking about possible Non-Clustered Indexes by looking at the search predicate. In our case the search predicate is as follows:
WHERE ProductID < @i
We request rows filtered on the column ProductID. Therefore we want to create a supporting Non-Clustered Index on that column. So let’s create that index.
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(ProductID) GO
After the creation of the Non-Clustered Index we have to test our change, so we execute our original query from the first listing again. And guess what? The Query Optimizer is not using the Non-Clustered Index that we just created! We have created a supporting Non-Clustered Index on the search predicate, and the Query Optimizer is not referencing it? Normally people are already out of luck at this point. But we can hint the Query Optimizer to use the Non-Clustered Index to get a better understanding of *why* the Query Optimizer hasn’t chosen the index automatically:
DECLARE @i INT = 999 SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, LineTotal FROM Sales.SalesOrderDetail WITH (INDEX(idx_Test)) WHERE ProductID < @i ORDER BY CarrierTrackingNumber GO
When you now look at the execution plan, you can see the following beast – a parallel plan!
The query takes 370244 logical reads! And the elapsed time is almost the same as previously with around 800ms. What the heck is going on here? When you look in more detail at the execution plan, you can see that the Query Optimizer has introduced a Bookmark Lookup, because the previous created Non-Clustered Index is not a Covering Non-Clustered Index for this query. The query is over the so-called Tipping Point, because we are retrieving almost all rows with our current search predicate. Therefore it doesn’t make sense to use the Non-Clustered Index in combination with a very expensive Bookmark Lookup.
Instead of thinking why the Query Optimizer hasn’t chosen the previous created Non-Clustered Index, we have just delegated that thinking process to the Query Optimizer itself, and have asked it through the query hint, why that Non-Clustered Index wasn’t chosen automatically. As I said at the beginning: I don’t want to think too much ☺.
To solve that problem with the Non-Clustered Index we have to include the additional requested columns from the SELECT list in the leaf level of the Non-Clustered Index. You can look again at the Bookmark Lookup to see which columns are currently missing in the leaf level:
Let’s recreate that Non-Clustered Index:
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(ProductID) INCLUDE (CarrierTrackingNumber, OrderQty, UnitPrice, UnitPriceDiscount) WITH ( DROP_EXISTING = ON ) GO
We have made another change, so we have to test our change again by running our query. But this time we run the query without the query hint, because the Query Optimizer should now choose the Non-Clustered Index automatically. And guess what? The index is now chosen when you look at the execution plan.
SQL Server now performs a Seek operation on the Non-Clustered Index, but we still have an explicit Sort operator in the execution plan. And because of the 30% hard-coded Cardinality Estimation the Sort operator still spills over to TempDb. Ouch! Our logical reads have dropped down to 757, but the elapsed time is still at around 800ms. What do you do now?
We can now try to include the column CarrierTrackingNumber at first in the navigation structure of the Non-Clustered Index. This is the column on which SQL Server performs the Sort operation. When we have that column first in the Non-Clustered Index, we have a physical presorting of our data by that column, and therefore the explicit Sort operator should go away. And as a positive side-effect there is nothing to spill over to TempDb. And no operator in the execution plan cares now about the wrong Cardinality Estimation. So let’s try that assumption by recreating the Non-Clustered Index again:
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(CarrierTrackingNumber, ProductID) INCLUDE (OrderQty, UnitPrice, UnitPriceDiscount) WITH ( DROP_EXISTING = ON ) GO
As you can see from the index definition, we have now physically presorted our data by the columns CarrierTrackingNumber and ProductID. When you now rerun the query, and when you have a look at the execution plan, you can see that the explit Sort operator has gone away, and that SQL Server scans the complete leaf level of the Non-Clustered Index (with a residual predicate for the search predicate).
That plan isn’t that bad! We just need 764 logical reads, and the elapsed time for this query is now down to 600ms. That’s a 25% improvement compared with previously! BUT: the Query Optimizer suggests to us a better Non-Clustered Index through the *great* (?) feature of the Missing Index Recommendations! Because we trust the Query Optimizer blindly, we create that recommended Non-Clustered Index:
CREATE NONCLUSTERED INDEX [SQL Server doesn't care about names, why I should care about names?] ON [Sales].[SalesOrderDetail] ([ProductID]) INCLUDE ([SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[LineTotal]) GO
When you now rerun the original query you will see amazing things: the Query Optimizer uses *OUR* previously created Non-Clustered Index, and the Missing Index Recommendation has gone away! You have just created an Index that is never used by SQL Server – except for INSERT, UPDATE, and DELETE statements where SQL Server has to maintain your Non-Clustered Index. You just have created *pure* overhead for your database. But on the other hand you have satisfied the Query Optimizer by eliminating the Missing Index Recommendation. But that’s *NOT* the goal: the goal is to create indexes that are *ALSO* used.
Conclusion: never, ever trust the Query Optimizer!
Today’s blog posting was a little bit controversial, but I wanted to show you how the Query Optimizer can help you when you work on your indexing strategy, and how the Query Optimizer can fool you when you work on your indexing strategy. Therefore it is very, vey important that you just make minor adjustments, and that you immediately test your change by running your query again. And when you use a Missing Index Recommendation from the Query Optimizer, please think whether the recommendation is a good one. As I have said – I don’t want to think. Ouch...
Thanks for your time,