In today’s blog posting I want to talk about a very specific performance problem related to indexing in SQL Server.
Imagine the following simple query, which you have already seen hundreds of times in your SQL Server life:
-- Results in an Index Scan SELECT * FROM Sales.SalesOrderHeader WHERE YEAR(OrderDate) = 2005 AND MONTH(OrderDate) = 7 GO
With that simple query, we request sales information for a specific month in a specific year. Not very complicated. Unfortunately that query doesn’t perform very well – even with a Non-Clustered Index on the column OrderDate. When you look at the execution plan, you can see that the Query Optimizer has chosen the Non-Clustered Index that is defined on the column OrderDate, but unfortunately SQL Server performs a complete Scan of the index, instead of performing an efficient Seek operation.
This isn’t really a limitation of SQL Server, this is the way how relational databases are working and thinking :-). As soon as you are applying an expression (function calls, calculations) on an indexed column (a so-called Search Argument), the database engine HAS TO SCAN that index, instead of performing a seek operation.
To get a scalable seek operation in the execution plan, you have to rewrite your query in a way to avoid the call of the function DATEPART:
-- Results in an Index Seek SELECT * FROM Sales.SalesOrderHeader WHERE OrderDate >= '20050701' AND OrderDate < '20050801' GO
As you can see from the rewritten query, the query returns the same result, but we have just eliminated the function call of DATEPART. When you look at the execution plan, you can see that SQL Server performs a seek operation - in that specific case it is a so-called Partial Range Scan: SQL Server seeks to the first value, and scan until he hits the last value of the requested range. If you have to call functions in the context of indexed columns, you should always make sure that these function calls are performed on the right hand side of your column in the query. Let’s have a look at a concrete example. The following query casts the indexed column CreditCardID to the data type CHAR(4):
-- Results in an Index Scan SELECT * FROM Sales.SalesOrderHeader WHERE CAST(CreditCardID AS CHAR(4)) = '1347' GO
When you have a more detailed look on the execution, you can see that SQL Server scans again the whole Non-Clustered Index. Not really scalable if your table gets larger and larger. If you are performing that conversation on the right hand side of your indexed column in the query, you can again eliminate the function call on the indexed column, and SQL Server is able to perform a seek operation:
-- Results in an Index Seek SELECT * FROM Sales.SalesOrderHeader WHERE CreditCardID = CAST('1347' AS INT) GO
As you have seen in this blog posting, it is very important that you are NOT calling any function directly or indirectly on your indexed columns. Otherwise SQL Server has to scan your index, instead of performing an efficient seek operation. And scans will never ever scale, when your table gets more and more rows.
Please feel free to leave a comment, if you want to share other good examples where you have encountered this specific behavior.
Thanks for reading!