Cardinality Estimation for Correlated Columns

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a very specific problem in cardinality estimation: working with correlated columns in a search predicate. In the first step we will look at the behavior that SQL Server is using since version 7.0, and finally we will have a more detailed look on SQL Server 2014, which implements a completely new behavior how to handle correlated columns during Query Optimization.

Correlated Columns – what?

Before we go into the details about that specific problem, we have in the first step to clarify what correlated columns are. When we look at the Query Optimizer used by SQL Server, the optimizer is based on 4 core assumptions:

  • Independence
  • Uniformity
  • Containment
  • Inclusion

I don’t want to elaborate on each assumption in detail, because they are described in other whitepapers very well. You will find a link to such a whitepaper in the summary section of this blog posting. The assumption on which we will concentrate today is the 1st one – Independence. Independence means that the columns used in a search predicate (the WHERE clause) are independent and returning different records, when queried alone. They are not correlated against each other. Unfortunately this assumption is not always true. Let’s have a look on a more concrete example, where that assumption is violated. Assume the following 2 queries:

SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 74000 AND SalesOrderID < 75000
GO

SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate >= '20080626' AND OrderDate <= '20080724'
GO

In the first query we are returning 999 rows from the AdventureWorks2012 database by restricting on some specific SalesOrderID values. And the second query returns 1125 records based on the OrderDate column. But there are in sum 912 rows, where both search predicates evaluating to TRUE, means we have a correlation between both columns. But the Query Optimizer isn’t aware of this correlation.

SQL Server 7.0 – 2012 behavior

Let’s have now a look on how that correlation is handled from SQL Server 7.0 to SQL Server 2012. When you look at the execution plan of both queries, you can see that the Query Optimizer estimates for the first query 999.936 rows and for the second one 1125 rows.

Cardinality Estimation for a single search prediateCardinality Estimation for a single search prediate

With that information and the cardinality of our table (the number of records our table has – 31465 in our case), we can calculate the so-called Selectivity of this search predicate. The selectivity is just a number between 0 and 1. The smaller the selectivity is, the less records are returned from a query (0.0 = 0% of the rows are returned, 1.0 means 100% of the rows are returned). We can calculate the selectivity of a search predicate by dividing the estimated number of rows by the number of records that we have in our table stored. The selectivity of the first search predicate is therefore 0.03177931034482758620689655172414 (999.936 / 31465 table rows), and the selectivity of the second search predicate is 0.03575401239472429683775623708883 (1125 / 31465 table rows). Imagine now what happens, if we use both search predicates combined with an AND operator (a so-called Conjunction):

SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 74000 AND SalesOrderID < 75000
AND OrderDate >= '20080626' AND OrderDate <= '20080724'
GO

When you look at the execution plan, the Query Optimizer estimates at the Clustered Index Seek (Clustered) operator 35.7517 rows!

Wrong Cardinality Estimation for correlated columns

In reality the query returns the 912 rows mentioned earlier. That’s a huge discrepancy. The Query Optimizer just multiplies both selectivity values of our search predicates with the cardinality of our table to produce the final row estimation. SQL Server just assumes that both search predicates are returning DIFFERENT rows – there is an Independence assumed between both columns:

0.03177931034482758620689655172414 * 0.03575401239472429683775623708883 * 31465 = 35.7517241379310344827586206896586

Of course, the reality is completely different, because there is a huge correlation between both search predicates. Therefore you can see the large discrepancy between the Estimated and Actual Number of Rows. The more individual search predicates you are combining with a conjunction in a query, the larger the discrepancy will be. When the final estimation falls below 1.0 row, the Query Optimizer will always estimate 1 row at least – 0 rows are never estimated.

SQL Server 2014 behavior

As you might heard, SQL Server 2014 consists of a new Cardinality Estimator. As soon as your database is in the compatibility mode 120, the new Cardinality Estimator is used. Be aware of that fact, when you are restoring or attaching a database from an earlier version of SQL Server – the compatibility is here still the old one! If you want to use the new Cardinality Estimator without changing the compatibility mode, you can also use the new Trace Flag 2312. Let’s run now the above query with both search predicates with the new Cardinality Estimator by applying Trace Flag 2312 to the query.

SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 74000 AND SalesOrderID < 75000
AND OrderDate >= '20080626' AND OrderDate <= '20080724'
OPTION (RECOMPILE, QUERYTRACEON  2312)
GO

When you look at the execution plan, you can see that the Cardinality Estimation has changed.

Softened Cardinality Estimation for correlated columns in SQL Server 2014

The new Cardinality Estimator estimates now 188.898 rows. Much more than the 35.75 rows with the old one. But there is still a huge gap to the 912 rows, that the query returns in reality. But which formula is using now the new Cardinality Estimator? The new one uses now a so-called Exponential Back-off algorithm. The Query Optimizer takes the first 4 search predicates, and sorts them by their selectivity. All selectivity values are again multiplied by each other, but the difference is that every subsequent value is softened by taking a larger square root of it. Let’s have a look on the formula to understand this behavior:

c0 * (c1 ^ 1/2) * (c2 ^1/4) * (c3 ^1/8)

When we look on our concrete example we can derive the final cardinality by using the following calculation:

0.03177931034482758620689655172414 * SQRT(0.03575401239472429683775623708883) * 31465 = 189.075212620762

There is a small discrepancy in our calculation compared to the estimation of 188.898 rows, because the estimation of 999.936 rows was rounded in the execution plan provided by SQL Server. By using the exponential back-off algorithm, the Query Optimizer makes sure to make a better estimation and tightens the hole between the Estimated and Actual Number of Rows, if there is a correlation between the applied search arguments.

Summary

In this blog posting we have looked on one specific problem during cardinality estimation in relational databases: handling the correlation between columns that are used as search predicates. Prior to SQL Server 2014, the Query Optimizer used a very moderate approach by just multiplying the various selectivity values. This can lead to huge underestimations, which can cause you troubles, when an upfront operator in the execution plan relies on these estimations (e.g. a Sort or Hash operator).

The new Cardinality Estimator of SQL Server 2014 deals with an enhanced mechanism for this specific problem by using an exponential back-off formula during the Cardinality Estimations which produces better estimations. But there are still discrepancies compared to the Actual Number of Rows during runtime. If you want to read more about the new Cardinality Estimator of SQL Server 2014, I suggest to read the excellent whitepaper Optimizing your Query Plans with the SQL Server 2014 Cardinality Estimator written by Joe Sack (Blog, Twitter).

Thanks for reading!

-Klaus

 

A ever-increasing Clustered Key value doesn’t scale!

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

You know all the best practices how to choose a Clustered Key value? A good Clustered Key value should have the following 3 properties:

  • Narrow
  • Static
  • Ever Increasing

Let’s have a more detailed look on all 3 properties, and why an ever increasing value doesn’t really scale in SQL Server.

Narrow

A Clustered Key value should be as small as possible. Why? Because it takes space, and the Clustered Key Value also serves a logical pointer in the leaf level in every Non-Clustered Index. If you have a very wide Clustered Key value, you also deal with larger Non-Clustered Indexes. If you have defined a Non-Unique Non-Clustered Index (which is normally the case), the Clustered Key value is also part of the navigation structure of your Non-Clustered Index. Therefore the overhead in your index gets very large. And our goal is to minimize overhead on our index pages. Overhead that we have to pay in the physical storage, and also in the Buffer Pool, where SQL Server caches the read index pages from storage.

Normally you choose a technical key value (like INT/BIGINT data type) instead of a natural key value. I have already seen over the years Clustered Key value lengths of 100 bytes and more (combinations of LastName, FirstName, SocialSecurityNumber, etc.). Believe me – you are just waisting memory! You don’t have to do that. Choose a technical key, and you are fine.

Static

Because the Clustered Key value is replicated in every Non-Clustered Index, your Clustered Key value should never ever change! Otherwise SQL Server has to maintain transparently in your UPDATE Execution Plan EVERY Non-Clustered Index that you have defined on your table. You are again just introducing additional computing overhead that you don’t need. Use your CPU cycles for more other important stuff. As you know, natural key values can change (like a LastName column, when you get married).

A technical key value (like an INT IDENTITY) can’t change (by default). Therefore the logical pointer (in the form of the Clustered Key value) in all your Non-Clustered Indexes remains stable - without any need to change them – forever!

Ever Increasing

And the 3rd final important property of a “good” Clustered Key value is that the chosen column should give you ever-increasing values. Why? Because you are always adding additional records at the end of your Clustered Index, and therefore you are avoiding expensive Page Splits (regarding CPU cycles, and Transaction Log overhead) and Index Fragmentation. In 99% of all cases you will be fine with an ever-increasing value like an INT IDENTITY column, but there are some scenarios, where this approach can lead to serious scalability problems. Imagine you have a workload, where a lot of different users are inserting permanently into the same table with an ever-increasing Clustered Key value. Just think a second about about Logging/Auditing tables.

Let’s have a more detailed look on what happens internally in SQL Server, when you reading and writing pages in memory. When SQL Server accesses certain memory structures (like pages that are stored in the Buffer Pool), these memory accesses must be synchronized among multiple threads. You can’t write concurrently to the same page in memory. When a thread writes to a page, some other thread can’t read the page at the same time. In traditional concurrent programming you solve that problem with Mutexes - like a Critical Section. Certain code paths are just mutually exclusive. A Latch in SQL Server is almost the same as a Critical Section. And latches are used to synchronize threads/queries among each other. Every time when you read a page, the worker thread has to acquire a Shared Latch (SH), every time when you write a page, the worker thread has to acquire an Exclusive Latch (EX). And both latches are incompatible to each other.

When you now perform an INSERT statement, the worker thread exclusively latches the page where the INSERT statement occurs. In the mean time no one else can read and write from/to this page. With an ever-increasing Clustered Key value this approach doesn’t really scale, because you are just inserting your records at the end of your Clustered Index. Therefore your parallel threads/queries are contending about an Exclusive Latch on the same last page in your Clustered Index. As a side-effect SQL Server executes your INSERT statement serially – one INSERT after the next one. You have hit the famous Last Page Insert Latch Contention. Let’s have a look at the following picture.

Last Page Insert Latch Contention

With the best practice of an ever-increasing Clustered Key value you have a single hotspot at the end of your Clustered Key. The smaller your records are, the more contention you are introducing here. How can you solve that problem? Easy: spread your INSERT statements across the whole B-Tree structure of the Clustered Index. There are various approaches how you can achieve that:

  • Use a random Clustered Key value (like a UNIQUEIDENTIFIER). But be aware of the side-effects: larger logical pointer in EVERY Non-Clustered Index, Page Splits…)
  • Implement Hash Partitioning, if you are using the Enterprise Edition of SQL Server.
  • Eliminate latching through the use of In-Memory OLTP, that is part of SQL Server 2014.
  • Use a so-called Reverse Index. Unfortunately SQL Server doesn’t provide you that kind of index out-of-the box, like Oracle. But you can implement it at your own

Summary

At 99% you will be fine with a narrow, static, and ever-increasing Clustered Key value like an INT IDENTITY data type. But in some rare scenarios where you need a huge amount of parallel INSERT statements (like Logging/Auditing tables), you can hit the Last Page Insert Latch Contention with that approach. If you hit that specific problem, you have to leave your comfort zone, and you have to make sure that you spread the INSERT statements across your whole B-Tree structure. You are mainly fighting against a limitation of how multi-threaded access happens in a traditional B-Tree structure.

I hope that I have given you with that blog posting a good insight, why ever-increasing Clustered Key values can hurt the scalability of your tables.

If you are more interested in how to choose the right Clustered Key Value, I’m also offering a 1-hour long training video through the SQLpassion Online Academy.

Thanks for reading!

-Klaus

Improving Query Performance by using correct Search Arguments

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a very specific performance problem related to indexing in SQL Server.

The Problem

Imagine the following simple query, which you have already seen hundreds of times in your SQL Server life:

-- Results in an Index Scan
SELECT * FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2005 AND MONTH(OrderDate) = 7
GO

With that simple query, we request sales information for a specific month in a specific year. Not very complicated. Unfortunately that query doesn’t perform very well – even with a Non-Clustered Index on the column OrderDate. When you look at the execution plan, you can see that the Query Optimizer has chosen the Non-Clustered Index that is defined on the column OrderDate, but unfortunately SQL Server performs a complete Scan of the index, instead of performing an efficient Seek operation.

Non Clustered Index Scan

This isn’t really a limitation of SQL Server, this is the way how relational databases are working and thinking :-) . As soon as you are applying an expression (function calls, calculations) on an indexed column (a so-called Search Argument), the database engine HAS TO SCAN that index, instead of performing a seek operation.

The Solution

To get a scalable seek operation in the execution plan, you have to rewrite your query in a way to avoid the call of the function DATEPART:

-- Results in an Index Seek
SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate >= '20050701' AND OrderDate < '20050801'
GO

As you can see from the rewritten query, the query returns the same result, but we have just eliminated the function call of DATEPART. When you look at the execution plan, you can see that SQL Server performs a seek operation – in that specific case it is a so-called Partial Range Scan: SQL Server seeks to the first value, and scan until he hits the last value of the requested range. If you have to call functions in the context of indexed columns, you should always make sure that these function calls are performed on the right hand side of your column in the query. Let’s have a look at a concrete example. The following query casts the indexed column CreditCardID to the data type CHAR(4):

-- Results in an Index Scan
SELECT * FROM Sales.SalesOrderHeader
WHERE CAST(CreditCardID AS CHAR(4)) = '1347'
GO

When you have a more detailed look on the execution, you can see that SQL Server scans again the whole Non-Clustered Index. Not really scalable if your table gets larger and larger. If you are performing that conversation on the right hand side of your indexed column in the query, you can again eliminate the function call on the indexed column, and SQL Server is able to perform a seek operation:

-- Results in an Index Seek
SELECT * FROM Sales.SalesOrderHeader
WHERE CreditCardID = CAST('1347' AS INT)
GO

Another nice example where you can run into the same problems is the use of the CASE expression in SQL Server. Let’s have a look at the following query, where the column PersonType is indexed through a Non-Clustered Index:

-- Results in an Index Scan
SELECT
	CASE PersonType
		WHEN 'IN' THEN 1
		WHEN 'EM' THEN 2
		WHEN 'SP' THEN 3
		WHEN 'SC' THEN 4
		WHEN 'VC' THEN 5
		WHEN 'GC' THEN 6
	END AS ConvertedPersonType
FROM Person.Person
GO

You are calling here again a function on an indexed column, in our case the CASE expression. You can’t directly see that function call in the T-SQL query, but internally it’s nothing more than a function call. You are getting again a Scan of the Non-Clustered Index instead of a Seek operation. How can you avoid that problem? You can rewrite the query to get rid of the CASE expression. One example is a join against a Common Table Expression, which stores the needed lookup values. Let’s have a look on the rewritten query:

-- Results in an Index Seek
WITH LookupCTE AS
(
	SELECT * FROM
	(
		VALUES
			(N'IN', 1),
			(N'EM', 2),
			(N'SP', 3),
			(N'SC', 4),
			(N'VC', 5),
			(N'GC', 6)
	) tbl (PersonType, Value)
)
SELECT cte.Value FROM Person.Person p
INNER JOIN LookupCTE cte ON cte.PersonType = p.PersonType
GO

With that approach, SQL Server is able to perform a seek operation on the indexed column PersonType.

Summary

As you have seen in this blog posting, it is very important that you are NOT calling any function directly or indirectly on your indexed columns. Otherwise SQL Server has to scan your index, instead of performing an efficient seek operation. And scans will never ever scale, when your table gets more and more rows.

Please feel free to leave a comment, if you want to share other good examples where you have encountered this specific behavior.

Thanks for reading!

-Klaus

Goodbye SQL Server, Hello FileMaker…

The last weeks were very time consuming for me, because I had to make a very tough decision: staying independent and providing high quality SQL Server consulting and training services, or getting employed (again), and being part of a greater team to influence how the IT industry will be in future.

After a very long thinking time, I have made my decision, and joining starting today – on April 1 – Apple. I will be the lead architect for FileMaker, the relational database on the Apple platform. I will be in charge of the whole dev team, leading the various development teams in FileMaker and also coordinate the necessary changes within Mac OS X. Our goal will be the integration of FileMaker within the OS, making HFS a transactional file system – the vision of WinFS, which was discarded by Microsoft…

Why me? I have a lot of experience in the area of relational databases (I know how to write basic SELECT * queries, which was also the toughest part of the SQLMCM certification). And of course the another most important factor why Apple has chosen me over other candidates is my hands-on experience in writing operating systems. As you might know, I have written my own OS, which boots from a FAT12 partition (I have found some code somewhere in the Internet…), reads a key from the keyboard, and writes it out. Everything done in x32 Protected Mode, without any BIOS calls! Impressive, huh? Apple was really amazed by that piece of software, therefore we will also evaluate if that code can be used as the future basis for Mac OS X 15.1.

Of course, Apple screened a lot of different candidates for this position, and I had a very hard competition. In the final round we were only 3 people, and I was feeling like someone in the TV show “Deutschland sucht den Superstar” (Germany searches for the super star). I never heard anything about the other 2 persons, the 1st one was a guy called Bob, and the name of the other one was Conor.

Bob performed really amazing, but he had too less low-level experience. To understand that a little bit better, I want to give you an example: one task that we had to perform during the final round was to join 2 tables together. Very simple task when you know the concept of a cartesian product. But Bob was opening WinDbg, and was doing everything with a graphical hex-dump. He violated the rules, because we were not allowed to use any GUI front-end… And Conor? He talked the whole day about trees, and cardinality estimations… So my solution with the cartesian product was chosen over their approaches. I`m the lucky one!

So wish me luck in my new role, and let´s change the way how relational databases are working.

Thanks for reading and a happy 1st April!

-Klaus

SQL Server Quickie #12 – Nested Loop Operator

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 12th SQL Server Quickie to YouTube. This time I’m talking about the Nested Loop Join Operator in SQL Server. You can find the script used during the demonstration here as a download.

Thanks for watching!

-Klaus

The Illusion of Updateable Clustered ColumnStore Indexes

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

In todays blog posting I want to concentrate in more details on the Updateable Clustered ColumnStore Index that is introduced with SQL Server 2014. Before we go down to the details, I want to give you a brief overview about their first appearance in SQL Server 2012, and their limitations.

ColumnStore Indexes in SQL Server 2012

The introduction of ColumnStore Indexes in SQL Server 2012 was one of the hottest new features (besides AlwaysOn). If chosen right, they can make an impressive performance improvement for Data Warehousing workloads. Unfortunately, they had 2 big limitations:

  • There was only the support for Non-Clustered ColumnStore Indexes
  • As soon as you had created a ColumnStore Index on a table, the underlying table was read only, and no changes to the data were allowed anymore

Both limitations were a huge bummer for some customers. Imagine you had a table with 300 GB of traditional row store data. With a ColumnStore Index it’s possible to compress that data down to a size of 30 GB. But SQL Server 2012 only allowed a Non-Clustered ColumnStore Index, means you had to store your data twice: one in the traditional row store format, and once in the new ColumnStore format. That’s a huge waste of storage, because your queries will (hopefully) only use your Non-Clustered ColumnStore Index.

And as soon as you had created your Non-Clustered ColumnStore Index, you were not allowed to change the underlying table data anymore – your table was just read only! Of course, there were some workarounds for this problem, like Partition Switching, but you still needed to implement that on your own…

ColumnStore Indexes in SQL Server 2014

Things are changing now with SQL Server 2014, because Microsoft has resolved the above mentioned issues – with some magic and illusion: SQL Server 2014 provides you an Updateable Clustered ColumnStore Index! Let’s have a more detailed look on how this magic and illusion happens internally in SQL Server.

The first most important fact is that an underlying direct update of a ColumnStore Index is not possible! It would be too time consuming to do the complete decompress and compress on the fly during your INSERT, UPDATE, and DELETE transactions. Therefore SQL Server 2014 uses help from some magic: Delta Stores and Delete Bitmaps. Let’s have a more detailed look on both concepts.

Every time when you run an INSERT statement, the new record isn’t directly inserted into the ColumnStore Index – the record is inserted into a Delta Store. The Delta Store itself is nothing else than a traditional B-Tree structure with all its pro’s and con’s. When you afterwards read from the ColumnStore Index, SQL Server returns you the data from the compressed ColumnStore Index AND also from the Delta Store.

When you run an DELETE statement, again nothing happens in the compressed ColumnStore Index. The only thing that happens is that the record is deleted logically through a Delete Bitmap. Every record in the ColumnStore Index has a corresponding bit in that Delete Bitmap. When you again read your ColumnStore Index, SQL Server just discards the rows that are marked as deleted in the Delete Bitmap.

And running an UPDATE statement just means inserting the new version into the Delta Store, and marking the old version as deleted in the Delete Bitmap. Easy, isn’t it? The following picture (source http://research.microsoft.com/apps/pubs/default.aspx?id=193599) shows this concept in more details.

ColumnStore

Because of the Delta Stores and the Delete Bitmap it seems that your ColumnStore Index is updateable, but in reality it is just immutable. There is also a background process called the Tuple Mover, which runs regularly and finally pushes your changes asynchronously into the compressed ColumnStore Index.

In addition you can also finally define a ColumnStore Index in SQL Server 2014 as Clustered. This means that you don’t need to have your data in the traditional row store format anymore. Just create your table and create a Clustered ColumnStore Index on it. You can some with this approach huge space savings in your storage, because everything is now compressed. When you work in more details with the concepts of relational databases, Clustered always means Sorted. But with a Clustered ColumnStore Index this statement is not true: when you create a Clustered ColumnStore Index, there is NO sorting order in your data! Just be aware of this tiny little fact ;-)

Summary

Updateable Clustered ColumnStore Indexes are a nice magical illusion of SQL Server 2014. Don’t get me wrong: I really like the possibilities introduced with this new feature, but you have to understand how the feature is implemented internally, if you want to make the best use of it.

Thanks for reading

-Klaus

Performance improvements in SQL Server 2014

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

SQL Server 2014 will be an amazing release regarding all the various performance enhancements that are part of this new release. Over the last months I have already done a huge amount of work and tests on the latest CTP version of SQL Server 2014, and I have already blogged about my experience. In this blog posting I want to give you an overview about the various performance related enhancements that are introduced.

If you are interested in a more personal face-2-face training about performance enhancements in SQL Server 2014, I’m also running in May & June around Europe my one-day long SQL Server 2014 Performance Tuning Day, where you will learn the nifty details about this various enhancements.

Buffer Pool Extensions

The idea about Buffer Pool Extensions is very easy: expand the Buffer Pool with a paging file that is stored on very fast storage, like SSD drives. The Buffer Pool Extensions are coming quite handy, if you don’t have the ability to physically add more additional RAM to your database server.

Resource Governor

Resource Governor was introduced first back with SQL Server 2008, but wasn’t really a mature technology, because you had no possibility to govern I/O operations on the storage level, and you also had no chance to limit the size of the Buffer Pool for a specific workload group. With SQL Server 2014 things are changing, because you can now throttle I/O operations. Limiting Buffer Pool usage is still not possible, but hey who knows what comes in SQL Server 2016 ;-) .

Lock Priorities

As you might know, SQL Server gives you in the Enterprise Edition Online operations, or as I call them “Almost Online Operations”. They are almost online, because internally SQL Server still has to acquire some locks, which can lead to blocking situations. For that reason SQL Server 2014 introduces Lock Priorities, where you can control how SQL Server should react, if such a blocking situation occurs.

Clustered ColumnStore Indexes

One of the hottest enhancements in SQL Server 2014 is the introduction of Clustered ColumnStore Indexes, which is an amazingly new way concept how to deal with ColumnStore data in SQL Server. And in addition the Clustered ColumnStore Index can be also changed directly – without using tricks like Partition Switching.

In-Memory OLTP

With In-Memory OLTP Microsoft claims that the performance of your workload can be improved up to 100x. Awesome! Everything is now stored directly in the memory, without touching your physical storage anymore (besides the transaction log, if you want). And in addition In-Memory OLTP is based on so-called Lock Free Data Structures, means locking, blocking, latching, and spinlocking is just gone. Of course, there are side-effects and even limitations with this promising approach…

Delayed Transactions

It doesn’t matter how good the throughput of your workload is, the final barrier and bottleneck is almost every time the transaction log. Because of the Write-Ahead Logging mechanism used by SQL Server, a transaction must be always written physically to the transaction log, before the transaction is committed. When your transaction log is on slow storage, your performance and throughput will suffer. For that reason SQL Server 2014 implements so-called Delayed Transactions

Cardinality Estimation

Cardinality Estimation is the most important thing in a relational database, because these estimations are feeded into the Query Optimizer, whose job it is to produce a good-enough execution plan. With SQL Server 2014 Microsoft has rewritten the cardinality estimator completely from scratch to overcome some limitations based on the history of this very important component.

Thanks for reading

-Klaus

Buffer Pool Extensions in SQL Server 2014

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

In todays blog posting I want to talk about Buffer Pool Extensions that are introduced with SQL Server 2014. As you know, the Buffer Pool is one of the main memory consumers in SQL Server. When you read data from your storage, the data is cached in the Buffer Pool. SQL Server caches Execution Plans in the Plan Cache, which is also part of the Buffer Pool. The more physical memory you have, the larger your Buffer Pool will be (configured through the Max Server Memory setting).

A lot of SQL Server customers are dealing with the problem that physical memory is restricted in database servers: all memory slots are already occupied, so how do you want to add additional memory to the physical server? Of course, you can migrate to a larger server, but that’s another story… The solution to this specific problem is the introduction of Buffer Pool Extensions in SQL Server 2014. With the help of the Buffer Pool Extensions SQL Server introduces another layer in the memory hierarchy. Let’s have a look on the following picture:

Memory Hierarchy with Buffer Pool Extensions

As you can see you have on the top the Buffer Pool itself, which is very fast (regarding latency times), and at the bottom you can see our traditional storage, which is mainly slow. And the Buffer Pool Extensions are just settled down between of both – between the traditional Buffer Pool and our storage. The Buffer Pool Extensions itself are consisting of one simple file (the so-called Extension File) that should be stored on very fast storage – like a SSD drive. The Extension File is mainly the same as the page file in the Windows OS. Instead of adding additional physical memory to your database server, you just configure an Extension File on a SSD drive – that’s it!

Before I’m talking about the configuration and the concrete use of the Buffer Pool Extensions, I want to talk a little bit about the architecture and design behind the Buffer Pool Extensions. The traditional Buffer Pool of SQL Server always differentiates between clean and dirty pages. A clean page is a page which has the same content as the page in the storage. A dirty page is changed in memory, but hasn’t yet written to the storage. Around every minute the so-called CHECKPOINT process writes dirty pages out to the storage, means that our dirty page becomes a clean page.

The Buffer Pool Extensions itself are only used if the Buffer Pool of SQL Server get’s into memory pressure. Memory pressure means that SQL Server needs more memory than currently available. In that case the Buffer evicts pages from the Buffer Pool, which were least recently used. SQL Server uses here a Least Recently Used Policy (LRU). If you have now configured an Extension File, then SQL Server will write these pages into it, instead of writing them directly out to our slow storage. If the page is a dirty one, then the page will be also concurrently written to the physical storage (through an asynchronous I/O operation). Therefore you can’t loose any data when you are dealing with the Buffer Pool Extensions. At some point in time your Extension File will be also completely full. In that case SQL Server has to evict older pages from the Extension File (again through a LRU policy), and finally writes them to the physical storage. The Extension File just acts as an additional layer between the Buffer Pool and the storage itself.

Let’s have now a look on how we can configure the Buffer Pool Extensions in SQL Server 2014. SQL Server offers you here the command ALTER SERVER CONFIGURATION SET BUFFER POOL EXTENSION. Let’s have a more detailed look on how you use it:

ALTER SERVER CONFIGURATION
SET BUFFER POOL EXTENSION ON
(
   FILENAME = 'd:\ExtensionFile.BPE',
   SIZE = 10 GB
)
GO

The first restriction that you are hitting here is the fact that the Extension File must have at least the same size as the Buffer Pool itself. If you are specifying a smaller file size, you will get a lovely error message from SQL Server:


Msg 868, Level 16, State 1, Line 1
Buffer pool extension size must be larger than the current memory allocation threshold 1596 MB. Buffer pool extension is not enabled.

The next restriction that you will definitely hit is that you can’t change the size of the Extension File during the runtime of SQL Server. For example, when you want to change the Extension File to a larger size, you have to disable the Buffer Pool Extensions, and re-enable them again. You will have of course a performance degradation in the time of this operation, because you are just disabling one important caching layer used by SQL Server!

Be aware of this fact, when you are planning a deployment of the Buffer Pool Extensions for your production environment!!!

And in addition you are also not able to reduce the size of the Extension File, the file must be always larger than previously. Otherwise you are again getting an error message:


Msg 868, Level 16, State 1, Line 3
Buffer pool extension size must be larger than the current memory allocation threshold 4096 MB. Buffer pool extension is not enabled.

The whole configuration of the Buffer Pool Extensions can be also retrieved through the Dynamic Management View sys.dm_os_buffer_pool_extension_configuration.

When should you use the Buffer Pool Extensions? Microsoft makes the recommendation that your workload should be write-heavy, e.g. a OLTP workload. You should not have a look on the Buffer Pool Extensions, when you are dealing with a DWH/BI related workload – an Extension File doesn’t make sense here. And when we are talking about the Extension File itself, you should spend a very fast SSD for it! Traditional rotational hard disks are a No-Go for it!

Thanks for reading!

-Klaus

SQL Server Quickie #11 – Non Clustered Indexes

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 11th SQL Server Quickie to YouTube. This time I’m talking about Non Clustered Indexes in SQL Server. You can find the script used during the demonstration here as a download.

Thanks for watching!

-Klaus

Lock Escalations

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan, where you are getting week by week via email all the essential knowledge you need to know about performance tuning on SQL Server.)

In todays blog posting I want to talk about Lock Escalations in SQL Server. Lock Escalations are an optimization technique used by SQL Server to control the amount of locks that are held within the Lock Manager of SQL Server. Let’s start in the first step with the description of the so-called Lock Hierarchy in SQL Server, because that’s the reason why the concept of the Lock Escalations exists in a relational database like SQL Server.

Lock Hierarchy

The following picture shows you the lock hierarchy used by SQL Server:

Lock Hierarchy in SQL Server

As you can see from the picture, the lock hierarchy starts at the database level, and goes down to the row level. You always have a Shared Lock (S) on the database level itself. When your query is connected to a database (e.g. USE MyDatabase), the Shared Lock prevents the dropping of the database, or that backups are restored over that database. And underneath the database level, you have locks on the table, on the pages, and the records when you are performing an operation.

When you are executing a SELECT statement, you have an Intent Shared Lock (IS) on the table and page level, and a Shared Lock (S) on the record itself. When you are performing a data modification statement (INSERT, UPDATE, DELETE), you have an Intent Exclusive or Update Lock (IX or IU) on the table and page level, and a Exclusive or Update Lock (X or U) on the changed records. SQL Server always acquires locks from top to bottom to prevent so-called Race Conditions, when multiple threads trying to acquire locks concurrently within the locking hierarchy. Imagine now how the lock hierarchy would look like, when you perform a DELETE operation on a table against 20.000 rows. Let’s assume that a row is 400 bytes long, means that 20 records fit onto one page of 8kb:

Without Lock Escalation

You have one S Lock on the database, 1 IX Lock on the table, 1.000 IX locks on the pages (20.000 records are spread across 1.000 pages), and you have finally 20.000 X locks on the records itself. In sum you have acquired 21.002 locks for the DELETE operation. Every lock needs in SQL Server 96 bytes of memory, so we look at 1.9 MB of locks just for 1 simple query. This will not scale indefinitely when you run multiple queries in parallel. For that reason SQL Server implements now the so-called Lock Escalation.

Lock Escalations

As soon as you have more than 5.000 locks on one level in your locking hierarchy, SQL Server escalates these many fine-granularity locks into a simple coarse-granularity lock. By default SQL Server *always* escalates to the table level. This mean that your locking hierarchy from the previous example looks like the following after the Lock Escalation has been successfully performed.

With Lock Escalation

As you can see, you have only one big lock on the table itself. In the case of the DELETE operation, you have one Exclusive Lock (X) on the table level. This will hurt the concurrency of your database in a very negative way! Holding an Exclusive Lock on the table level means that no other session is able anymore to access that table – every other query will just block. When you are running your SELECT statement in the Repeatable Read Isolation Level, you are also holding your Shared Locks till the end of the transaction, means you will have a Lock Escalation as soon as you have read more than 5.000 rows. The result is here one Shared Lock on the table itself! Your table is temporary readonly, because every other data modification on that table will be blocked!

There is also a misconception that SQL Server will escalate from the row level to the page level, and finally to the table level. Wrong! Such a code path doesn’t exist in SQL Server! SQL Server will by default *always* escalate directly to the table level. An escalation policy to the page level just doesn’t exist. If you have your table partitioned (Enterprise Edition only!), then you can configure an escalation to the partition level. But you have to test here very carefully your data access pattern, because a Lock Escalation to the partition level can cause a deadlock. Therefore this option is also not enabled by default.

Since SQL Server 2008 you can also control how SQL Server performs the Lock Escalation – through the ALTER TABLE statement and the property LOCK_ESCALATION. There are 3 different options available:

  • TABLE
  • AUTO
  • DISABLE
-- Controllling Lock Escalation
ALTER TABLE Person.Person
SET
(
	LOCK_ESCALATION = AUTO -- or TABLE or DISABLE
)
GO

The default option is TABLE, means that SQL Server *always* performs the Lock Escalation to the table level – even when the table is partitioned. If you have your table partitioned, and you want to have a Partition Level Lock Escalation (because you have tested your data access pattern, and you don’t cause deadlocks with it), then you can change the option to AUTO. AUTO means that the Lock Escalation is performed to the partition level, if the table is partitioned, and otherwise to the table level. And with the option DISABLE you can completely disable the Lock Escalation for that specific table. But disabling Lock Escalations is not the very best option, because the Lock Manager of SQL Server can then consume a huge amount of memory, if you are not very carefully with your queries and your indexing strategy.

Conclusion

Lock Escalation in SQL Server is mainly a nightmare. How will you delete more than 5.000 rows from a table without running into Lock Escalations? You can disable Lock Escalation temporarily, but you have to be very careful here. Another option (that I’m suggesting) is to make your DELETE/UPDATE statements in a loop as different, separate transactions: DELETE/UPDATE less than 5.000 rows, so that you can prevent Lock Escalations. As a very nice side-effect your huge, big transaction will be splitted into multiple smaller ones, which will also help you with Auto Growth issues that you maybe have with your transaction log.

Thanks for reading

-Klaus