SQL Server Quickie #17 – Round Robin Allocations

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 17th SQL Server Quickie to YouTube. This time I’m talking about Round Robin Allocations operations in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

The dangerous beauty of the PIVOT operator in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to discuss one specific T-SQL language construct in SQL Server – the PIVOT operator that was introduced back with SQL Server 2005. I always refer to this language construct as one of the most dangerous ones in SQL Server – and you will very soon see why. Before we get into some specific problems and pitfalls, I first want to give you a basic overview of what can be accomplished with pivoting in SQL Server.

Overview

The basic idea behind the PIVOT operator in SQL Server is that you can rotate rows into columns during a T-SQL query. The operator itself was introduced back with SQL Server 2005 and is mainly used for databases that are built on the principles of an Entity Attribute Value model (EAV). The idea behind an EAV model is that you can extend database entities without performing database schema changes. For that reason an EAV model stores all attributes of an entity in one table as key/value pairs. Let’s have a look at the following table that models a simple key/value pairs table.

As you can see, I have inserted 2 database entities into that table, and every entity consists of multiple attributes. Every attribute is just an additional record in that table. If you want to extend the entity with additional attributes, you need to insert additional records into that table but there is no need to perform a database schema change – that’s the “power” of an open database schema…

Querying such an EAV table is of course very hard, because you are dealing with a flat key/value pair structure. For that reason you want to rotate the table contents, and rotate rows into columns. You can perform this rotation natively with the PIVOT operator, or completely manually through a traditional CASE expression. Before we get down to the nasty details of the PIVOT operator I want to show you how you can perform the rotation by hand with T-SQL and some CASE expressions. If you perform the rotation manually, your T-SQL query has to implement 3 different phases:

  1. Grouping Phase
  2. Spreading Phase
  3. Aggregation Phase

In the Grouping Phase we compact our EAV table into distinct database entities. In our case we perform a GROUP BY on the column RecordID. In the 2nd phase, the Spreading Phase, we use multiple CASE expressions to rotate rows into columns. And finally we use the MAX expression in the Aggregation Phase to return one distinct value for every row and column. Let’s have a look at the following T-SQL code.

As you can see from the code, it is very easy to identify each phase, and how they are mapped to the T-SQL query. The following picture shows you the result of the query, where we have finally rotated rows into columns.

Manually pivoted data

The PIVOT operator

Beginning with SQL Server 2005 (almost 10 years ago!), Microsoft introduced the PIVOT operator in T-SQL. With that operator you can perform the same transformation (rows to columns) with just one native operator. Sounds very easy and promising, doesn’t it? The following listing shows how you can perform the same transformation natively with the PIVOT operator.

When you run that query, you will get back the same result as seen in the previous picture. But when you look at the syntax of the PIVOT operator, you will see one big difference compared to the manual approach:

You can only specify the spreading and aggregation elements! There is no way to define the grouping elements explicitly!

The grouping elements are just the remaining columns that you haven’t referenced in the PIVOT operator. In our case we haven’t referenced the column RecordID in the PIVOT operator, therefore this column is used during the Grouping Phase. This leads to interesting side effects if we subsequently change the database schema, for example by adding additional columns to the underlying base table:

When you now run the same query with the PIVOT operator again (make sure you have non NULL values in that column), you will get back a completely different result, because the Grouping Phase is now done on the columns RecordID and SomeData (which we have just added).

The native PIVOT operator can lead to a wrong result

Contrast that with what happens if we re-execute the manually T-SQL query that we wrote to start with. It still returns the same correct result. That’s one of the biggest negative side effects of the PIVOT operator in SQL Server: the grouping elements can’t be defined explicitly. To overcome this problem, it is a best practice to use a table expression and only return the necessary columns from the table. With this approach you will also have no problems if you change the table schema at a later stage, because the additional columns are not returned (by default) from the table expression. Let’s have a look at the following listing.

As you can see from this code, I’m feeding the PIVOT operator through a table expression. And within the table expression you are only selecting the needed columns from the base table. This then means that you can change the table schema in the future without breaking the result of the PIVOT query.

Summary

I hope that this blog posting has shown you why the PIVOT operator can be very dangerous in SQL Server. The syntax itself leads to very efficient code, but as a side effect you can’t specify the grouping elements directly. Therefore you should always make sure to use a table expression to define exactly which columns are fed into the PIVOT operator to make the given result deterministic.

What is your experience with the PIVOT operator? Do you like it or not? If you don’t like it, what would you want to have changed?

Like or share to get the source code.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Database Shrink: NOTRUNCATE vs. TRUNCATEONLY

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Two weeks ago I published my latest SQL Server Quickie, which covered the basics of Database Shrink operations in SQL Server, and why you should never ever use them. After the release of the Quickie, I got really good feedback on it, and one interesting question was about the additional NOTRUNCATE and TRUNCATEONLY options that are available. Therefore I want to take my time in this blog posting to discuss both options in more detail. Before you move on, please be sure to watch the SQL Server Quickie to have a basic understanding about what happens to your data file when you run a database shrink operation.

NOTRUNCATE

When you provide the option NOTRUNCATE to the database shrink command, SQL Server performs the shrink operation as I have described on the flipchart in the Quickie. This means that SQL Server moves pages from the end of your data file towards the beginning, where some free unallocated space is found. The main difference is that the data file itself is not shrunk.

This means that the unused space at the end of the data file is not reclaimed back in the file system. The physical size of your data file is the same as before the operation. After performing a database shrink with the NOTRUNCATE option you end up with some free space available at the end of your data file – and of course the index fragmentation that is introduced.

TRUNCATEONLY

When you run the database shrink command with the TRUNCATEONLY option, SQL Server truncates the data file from the end towards the beginning as long as you have some free unallocated space at the end of the data file. With this option SQL Server doesn’t move any pages within the data file. You yourself must make sure that there is some free space available at the end of your data file, so that TRUNCATEONLY can reclaim some space for you.

If you have read very carefully up to this point, you can now already imagine what happens under the hood when you run a regular database shrink operation without providing additional options:

  • In the first step SQL Server runs the database shrink operation with the NOTRUNCATE option to gain some free space at the end of your data file. Nothing happens here to the physical size of your file in the Windows file system. This step also introduces the index fragmentation.
  • In the second step SQL Server runs the database shrink operation with the TRUNCATEONLY option, and finally removes the unallocated extents at the end of the data file. It also shrinks down the physical size of the file in the Windows file system.

Summary

It doesn’t matter which options the database shrink command provides to you: a database shrink is never ever recommended! If you know that you have unallocated space at the end of your data file that you want to get rid of, it’s much safer to run a database shrink with the option TRUNCATEONLY, because you don’t introduce index fragmentation. But as soon as you use the option NOTRUNCATE, SQL Server has to move pages inside your data file, and this introduces serious index fragmentation.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #16 – Database Shrink Operations

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 16th SQL Server Quickie to YouTube. This time I’m talking about Database Shrink operations in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Why do we need UPDATE Locks in SQL Server?

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I want to talk about a specific question that I almost get every time when I teach about Locking & Blocking in SQL Server: Why does SQL Server need to have Update Locks? Before we go down to the details of why they are needed, I first want to give you a basic overview of when an Update (U) Lock is acquired, and how the lock itself behaves regarding its compatibility.

In general an Update Lock is used in SQL Server when performing an UPDATE statement. When you look at the underlying query plan, you can see that such a plan always consists of 3 parts:

  • Reading data
  • Calculating new values
  • Writing data

Update Query Plan

When SQL Server initially reads the data to be changed in the first part of the query plan, Update Locks are acquired on the individual records. And finally these Update Locks are converted to Exclusive (X) Locks when the data is changed in the third part of the query plan. The question that arrises with this approach is always the same: why does SQL Server acquire Update Locks instead of Shared (S) Locks in the first phase? When you normally read data through a SELECT statement, a Shared Lock is also good enough. Why is there now a different approach with UPDATE query plans? Let’s have a more detailed look at it.

Deadlock Avoidance

First of all UPDATE Locks are needed to avoid deadlock situations in UPDATE query plans. Let’s try to imagine what happens when multiple UPDATE query plans acquire Shared Locks in the first phase of the plan, and afterwards convert these Shared Locks to Exclusive Locks when the data is finally changed in the third phase of the query plan:

  • The 1st query can’t convert the Shared Lock to an Exclusive Lock, because the 2nd query has already acquired a Shared Lock.
  • The 2nd query can’t convert the Shared Lock to an Exclusive Lock, because the 1st query has already acquired a Shared Lock.

That approach would lead to a traditional deadlock situation in a relational database:
Update Deadlock
That’s one of the main reasons why implementers of relational database engines have introduced Update Locks to avoid that specific deadlock situation. An Update Lock is only compatible with a Shared Lock, but isn’t compatible with another Update or Exclusive Lock. Therefore a deadlock situation can be avoided, because 2 UPDATE query plans can’t run concurrently at the same time. The 2nd query will just wait until the Update Lock can be acquired in the 1st phase of the query plan. An unpublished study of System R also showed that this kind of deadlock was the most prominent one. System R was initially implemented without any Update Locks.

Improved Concurrency

Instead of acquiring an Update Lock during the 1st phase, it would be also a viable option to acquire an Exclusive Lock directly in that phase. This will also overcome the deadlock problem, because an Exclusive Lock is not compatible with another Exclusive Lock. But the problem with that approach is limited concurrency, because in the mean time no other SELECT query can read the data that is currently exclusively locked. Therefore there is also the need for the Update Lock, because this specific lock is compatible with the traditional Shared Lock. As a result this means that other SELECT queries can read data, as long as individual Update Locks are not yet converted to Exclusive Locks. As a side-effect this will improve the concurrency of our parallel running queries. 

In traditional relational literature an Update Lock is a so-called Asymmetric Lock. In the context of the Update Lock that means that the Update Lock is compatible with the Shared Lock, but not vice-versa: the Shared Lock is not compatible with the Update Lock. But SQL Server doesn’t implement the Update Lock as an asymmetric one. The Update Lock is a symmetric one, which means that Update and Shared Locks are compatible in both directions. This will also improve the overall concurrency of the system, because it doesn’t introduce blocking situations between both lock types.

Summary

In todays blog posting I gave you an overview of Update Locks in SQL Server, and why they are needed. As you have seen there is a really strong need for Update Locks in a relational database, because otherwise it would yield to deadlock situations and decreased concurrency. I hope that you now have a better understanding of Update Locks, and how they are used in SQL Server.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQLBits recap

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Wow, what an intensive last week! Last week I attended the SQLBits conference in Telford/UK – in the middle of nowhere :-) . Telford itself is around 30 minutes in the west of Birmingham (where I flew into), but there is almost no civilization – or I didn’t have found it…It was the first conference in my life where I haven’t seen anything else besides the hotel and the conference venue. This makes life very simplified.

During the conference I have presented 2 sessions. On Friday I have talked about Latches, Spinlocks, and Latch-Free Data Structures, and on Saturday I have presented my session about the Dangerous Beauty of Bookmark Lookups. Both sessions where high attended, and I had a huge fun to present them.

I have also already submitted my session materials (slides & samples) to the conference owners, so you should be able very soon to get the materials through the conference website. Besides the amazing speaker lineup, we also had a lot of fun at the Steampunk party on Friday, which was THE highlight during the conference! I can’t tell you too much about the party itself, because what happens at SQLBits stays in SQLBits ;-) .

I got pink hairs!

Thanks for reading

-Klaus

Do you like this post? Why not share it with the SQL Server community?

My upcoming speaking schedule

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Over the next weeks and months I have an incredible speaking schedule around the whole world. You have plenty of opportunities to meet me in person, and discuss performance related topics in SQL Server.

My first stop is this week in Telford/UK for the SQLBits conference. I have been speaking at SQLBits for ages – I think my first conference participation was around 2009. On Friday I speak about “Latches, Spinlocks & Latch Free Data Structures” in SQL Server. If you have followed my blog postings over the last weeks, you can already see what to expect in this session. I will also show you a concrete example of Spinlock Contention in SQL Server, how you can troubleshoot, and finally solve it. On Saturday I have an additional session where I talk about “The Dangerous Beauty of Bookmark Lookups”. Bookmark Lookups are very powerful in SQL Server, but have a huge amount of side-effects. In this session we will concentrate on these side effects, and I will show you how you can overcome the various problems.

At the beginning of September I’m traveling for the first time in my life to the other hemisphere of the Earth – my trip brings me to Cape Town in South Africa, where I’m speaking at the SQLSaturday #301. On Friday, September 5 I give a whole day-long precon about “Practical SQL Server Performance Troubleshooting“, the precon that I have delivered in 2012 at the SQLPASS Summit in Seattle. In this precon we will work with a typical OLTP workload (based on the TPC-E benchmark), and work throughout the whole day to identify performance bottlenecks in SQL Server, and finally solve them – completely transparent to the workload itself. Be sure to register for my precon.

Two weeks after Cape Town, I’m flying to Dublin and delivering a precon and a regular session at the SQLSaturday #310 (September 19 – 20). My precon is about “Performance enhancements in SQL Server 2014“. Large parts are about the new cardinality estimator, and of course I will also talk about In-Memory OLTP and “Clustered” Column-Store Indexes. Dublin is always fun, and the recovery process afterwards is always very long – there is no fast recovery available ;-)

For the first time in my life I have also the honor to be a speaker at the SQL Server Days in Belgium (September 30 – October 1). I have already heard a lot of good things about this conference, so I’m really looking forward to be part of it, and delivering a session. My session will be about “Latches, Spinlocks and Latch Free Data Structures” – the same that I deliver this week in Telford/UK. If you miss SQLBits, there is still a chance to see this session once again in Europe.

The week after Belgium I will present at the SQLdays Conference in Munich/Germany (October 7 – 9), where I will deliver on Tuesday the same precon as in Cape Town – “Practical SQL Server Performance Troubleshooting“. I’m already looking forward to this conference, because I have known the owners for ages, and it’s a very nice conference in Germany.

At the beginning of November I travel to the west and flying to Seattle for the SQLPASS Summit, where I will also deliver my Latches, Spinlocks, and Latch Free Data Structures session. For me it’s the 7th participation as a speaker (I delivered my 1st session in 2006), so I’m already feeling like an old-timer. Maybe it is a coincidence that I’m losing hairs, and turning grey… ;-) The SQLPASS Summit is always an amazing experience, because it’s the largest SQL Server conference in the universe (assuming that only we on earth use SQL Server). I’m already looking forward to meeting all my friends and Twitter followers again in person.

As you can see I have an amazing and very time consuming speaking schedule over the next weeks and months. But that’s not all. In addition to my conference participations I also deliver 5 SQLpassion workshops in the autumn. I’m very proud to deliver my brand new SQL Server Query Tuning Workshop 4 times across Europe. This workshop covers the query tuning aspect of SQL Server, without going into DBA related topics. The main audience for the workshop is a SQL Server developer. I talk for 4 days about execution plans, execution plans, and execution plans. If you want to know the nasty details about SQL Server, I have my more in-depth SQL Server Performance Tuning & Troubleshooting Workshop, which covers all aspects of how to tune your complete SQL Server installation. It’s a more DBA centric course. Here are the dates and locations for my SQLpassion workshops:

  • SQL Server Query Tuning Workshops
    • October 13 – 16 in Zurich/Switzerland
    • October 20 – 23 in London/UK
    • October 27 – 30 in Vienna/Austria
    • November 17 – 20 in Utrecht/Netherlands
  • SQL Server Performance Tuning & Troubleshooting Workshop
    • November 24 – 28 in Antwerpen/Belgium

Besides that I’m using the quiet summer time for the various session preparations, and of course for preparing my new SQL Server Query Tuning Workshop. But my life doesn’t consist only of SQL Server. Besides my SQL Server business I run my Boeing 737-800 flight simulator, and I do a huge amount of DSLR photo/video shooting. I just ordered some books about Astrophotography and Time Lapsing. Imagine what you can expect when you combine both things together. Do you have ever seen stars during the day?

Stars in the Sky

And because I’m crazy I have started to run about a month ago. I already make a very good progress, and I’m hopefully able to complete my first 5k run in a few weeks!

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #15 – Merge Join Operator

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 15th SQL Server Quickie to YouTube. This time I’m talking about the Merge Join Operator in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Introduction to Spinlocks in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In last weeks blog posting I have talked about Latches in SQL Server. At the end I have also introduced a little bit the idea of spinlocks to you. Based on that foundation I will continue today the discussion about spinlocks in SQL Server, and will show you how you can troubleshoot them.

Why do we need Spinlocks?

As I have already pointed it out last week, it doesn’t make sense to put a latch in front of every shared data structure, and synchronize the access to the data structure across multiple threads with the latch. A latch has a huge overhead associated with it: when you can’t acquire a latch (because someone else has already an incompatible latch acquired), the query is forced to wait, and enters the SUSPENDED state. The query waits in the SUSPENDED state until the latch can be acquired, and afterwards moves on into the RUNNABLE state. The query remains in the RUNNABLE state as long as no CPU is available for query execution. As soon as the CPU is free, the query moves again into the RUNNING state and can finally access the shared data structure which is protected with the latch, which was successfully acquired. The following picture shows the state machine that SQLOS implements for the cooperative thread scheduling.

Cooperative Thread Scheduling.png

Because of the associated overhead of latches, it doesn’t make sense to protect “busy” data structures with a latch. For that reason SQL Server also implements so-called Spinlocks. A spinlock is like a latch a lightweight synchronization object used by the storage engine to synchronize thread access to shared data structures. The main difference to a latch is that you actively wait for the spinlock – without leaving the CPU. A “wait” on a spinlock always happens on the CPU, in the RUNNING state. You spin in a tight loop until the spinlock is acquired. It’s a so-called busy wait. The main advantage of a spinlock is that no context switches are involved when a query must wait on a spinlock. On the other hand busy waiting wastes CPU cycles that other queries might be able to use more productively.

To avoid waisting too much CPU cycles, SQL Server 2008 R2 and higher implements a so-called exponential backoff mechanism, where the thread stops spinning after some time and sleeps on the CPU. The interval after which a thread goes to sleep increases over time between the attemps to acquire the spinlock. This behavior can reduce the impact on CPU performance.

Spinlocks & Troubleshooting

The main DMV for troubleshooting spinlocks is sys.dm_os_spinlock_stats. Every row that is returned by that DMV represents one specific spinlock in SQL Server. SQL Server 2014 implements 262 different spinlocks. Let’s have a more detailed look at the various columns in this DMV.

  • name: The name of the spinlock
  • collisions: The number of times that threads were blocked by a spinlock when trying to access a protected data structure
  • spins: The number of times threads spinned in a loop trying to obtain the spinlock
  • spins_per_collision: Ratio between spins and collisions
  • sleep_time: The time that threads were sleeping because of a backoff
  • backoffs: The number of times that threads were backed-off to allow other threads to continue on the CPU

The most important column in this DMV is backoffs, because this column tells you how often a backoff event occurred for a specific spinlock type. And very high backoffs yield to high CPU consumption and a so-called Spinlock Contention in SQL Server. I have already seen SQL Server installations where 32 cores were running at 100% without performing any work – a typical symptom for spinlock contention.

To troubleshoot a spinlock contention problem in more detail you can use the XEvent sqlos.spinlock_backoff provided by Extended Events. This event is always raised when a backoff occurs. If you capture this event, you also have to make sure that you use a very good selective predicate, because backoffs will always occur in SQL Server. A good predicate can be a specific spinlock type, where you have already seen high backoffs through the above mentioned DMV. The following code sample shows how you can create such an XEvent session.

As you can see from the listing, I use here the histogram target to bucketize on the callstack. Therefore you can see which code path within SQL Server generated the highest backoffs for the specific spinlock type. You can even symbolize the call stack by enabling trace flag 3656. As a prerequisite you need to install the public symbols of SQL Server. Paul Randal (Blog, Twitter) has written a blog posting about it. Here you can see an output from this XEvent session.

sqldk.dll!XeSosPkg::spinlock_backoff::Publish+0×138
sqldk.dll!SpinlockBase::Sleep+0xc5
sqlmin.dll!Spinlock<129,7,1>::SpinToAcquireWithExponentialBackoff+0×169
sqlmin.dll!lck_lockInternal+0×841
sqlmin.dll!XactWorkspaceImp::GetSharedDBLockFromLockManager+0x18d
sqlmin.dll!XactWorkspaceImp::GetDBLockLocal+0x15b
sqlmin.dll!XactWorkspaceImp::GetDBLock+0x5a
sqlmin.dll!lockdb+0x4a sqlmin.dll!DBMgr::OpenDB+0x1ec
sqlmin.dll!sqlusedb+0xeb
sqllang.dll!usedb+0xb3
sqllang.dll!LoginUseDbHelper::UseByMDDatabaseId+0×93
sqllang.dll!LoginUseDbHelper::FDetermineSessionDb+0x3e1
sqllang.dll!FRedoLoginImpl+0xa1b
sqllang.dll!FRedoLogin+0x1c1
sqllang.dll!process_request+0x3ec
sqllang.dll!process_commands+0x4a3
sqldk.dll!SOS_Task::Param::Execute+0x21e
sqldk.dll!SOS_Scheduler::RunTask+0xa8
sqldk.dll!SOS_Scheduler::ProcessTasks+0×279
sqldk.dll!SchedulerManager::WorkerEntryPoint+0x24c
sqldk.dll!SystemThread::RunWorker+0x8f
sqldk.dll!SystemThreadDispatcher::ProcessWorker+0x3ab
sqldk.dll!SchedulerManager::ThreadEntryPoint+0×226

With the provided call stack, it is not that hard to identify in which area of SQL Server the spinlock contention occurred. In that specific call stack the contention occurred in the LOCK_HASH spinlock type that protects the hashtable of the lock manager. Every time when a lock or unlock operation in the lock manager is executed, a spinlock must be acquired on the corresponding hash bucket. As you can also see from the call stack the spinlock was acquired when calling the function GetSharedDBLockFromLockManager from the class XactWorkspaceImp. It’s an indication that a shared database lock was tried to be acquired, when connecting to a database. And this finally yielded to a spinlock contention in the LOCK_HASH spinlock with very high backoffs.

If you attend my talk Latches, Spinlocks, and Lock Free Data Structures at SQLbits (Telford, UK) in 2 weeks or at the SQLPASS Summit in Seattle in November, I will also show you how you can reproduce this spinlock contention, how to troubleshoot it, and finally how you can resolve it.

Summary

In this blog posting you have learned more about spinlocks in SQL Server. In the first part we have discussed why SQL Server needs to implement spinlocks. As you have seen, with spinlocks it’s just cheaper to protect a “busy” shared data structure from concurrent thread access – like the lock manager. And in the second section we had a more detailed look on how you can troubleshoot spinlock contention in SQL Server, and how you can identify with a symbolized call stack the root cause of the problem.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Introduction to Latches in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a more advanced, low-level synchronization object used by SQL Server: Latches. A latch is a lightweight synchronization object used by the Storage Engine of SQL Server to protect internal memory structures that can’t be accessed in a true multi-threaded fashion. In the first part of the blog posting I will talk about why there is a need for latches in SQL Server, and in the second part I will introduce various latch types to you, and how you can troubleshoot them.

Why do we need Latches?

Latches were first introduced in SQL Server 7.0, when Microsoft first introduced row-level locking. For row-level locking it was very important to introduce a concept like latching, because otherwise it would give rise to phenomena like Lost Updates in memory. As I have stated above, a latch is a lightweight synchronization object used by the Storage Engine to protect memory structures used internally by SQL Server. A latch is nothing more than a so-called Critical Section in multi-threaded programming – with some differences.

In traditional concurrent programming, a critical section is a code path, which must always run single-threaded – with only one thread at a time. A latch itself is a specialized version of a critical section, because it allows multiple concurrent readers. In the context of SQL Server this means that multiple threads can concurrently read a shared data structure, like a page in memory, but writing to that shared data structure must be performed single-threaded.

A latch is used to cordinate the physical execution of multiple threads within a database, whereas a lock is used on a logical level to achieve the required isolation based on the chosen isolation level of the transaction. You, as a developer or DBA, can influence locks in some ways – e.g. through the isolation level, or also through the various lock hints that are available in SQL Server. A latch on the other hand, can’t be controlled in a direct way. There are no latch hints in SQL Server, and there is also no latch isolation level available. The following table compares locks and latches against each other.

Locks vs. Latches

As you can see from that table, latches also support more fine grained modes like Keep and Destroy. A Keep latch is mainly used for reference counting, e.g. when you want to know how many other latches are waiting on a specific latch. And the Destroy latch is the most restrictive one (it even blocks the KP latch), which is used when a latch is destroyed, e.g. when the Lazy Writer wants to free up a page in memory. The following table gives you an overview of the latch compatibility matrix in SQL Server.

Latch Compatibility Matrix

In the following short flipchart demo, I want to show you why latches are needed in SQL Server, and which phenomena would happen without them.

As you have seen in the previous flipchart demo, consistency can’t be achieved in SQL Server with locking alone. SQL Server still has to access shared data structures that are not protected by the lock manager, like the page header. And even other components within SQL Server that have single-threaded code paths are built on the foundation of latches. Let’s continue therefore now with the various latch types in SQL Server, and how you can further troubleshoot them.

Latch Types & Troubleshooting

SQL Server distinguishes between 3 different types of latches:

  • IO Latches
  • Buffer Latches (BUF)
  • Non-Buffer Latches (Non-BUF)

Let’s have a more detailed look at these 3 different variations. I/O Latches are used by SQL Server when outstanding I/O operations against pages in the Buffer Pool are done – when you read and write from/to your storage subsystem. For these I/O latches SQL Server reports a wait type that starts with PAGEIOLATCH_. You can see the waiting times introduced with these types of latches in the DMV sys.dm_os_wait_stats in the following picture.

PAGEIOLATCH

With these latches SQL Server makes sure that pages are not read concurrently multiple times into the buffer pool, and that pages are not discarded from the buffer pool, when they are currently accessed by a query. In addition to the I/O latches SQL Server also supports so-called Buffer Latches, which are used to protect the pages in the buffer pool from concurrent running threads. These are the latches that SQL Server uses to prevent Lost Updates in memory, as I have demonstrated in the previous flipchart demo. Without these kinds of latches, it would be possible to read and write a page concurrently in the buffer pool, which would give rise to corruption of the pages in main memory. SQL Server also reports the waits introduced by these latches with wait types starting with PAGELATCH_*. These wait types are again reported to you through the DMV sys.dm_os_wait_stats. The most important thing here is that you hit contention in main memory, when there is not the term IO in the wait type name.

PAGELATCH

And finally SQL Server internally uses so-called Non-Buffer Latches to protect shared data structures besides the buffer pool itself. SQL Server also reports these latches in the DMV sys.dm_os_wait_stats with wait types starting with LATCH_.

LATCVH

But the waits reported in this DMV for Non-Buffer Latches are just a summary view of all individual latches that SQL Server uses internally. You can find a further breakdown of the individual latches in a separate DMV – sys.dm_os_latch_stats:

Latch Statistics

SQL Server 2014 internally uses 163 latches to synchronize access to shared data structures. One prominent latch is FGCB_ADD_REMOVE, which protects the so-called File Group Control Block (FGCB) of a file group during certain operations like:

  • File growth (manually & automatically)
  • Adding/dropping a file from a file group
  • Recalculating proportional fill weightings
  • Cycling through a files of a file group during the round-robin allocation

When you see high waits on that specific latch, you mainly have problems with too many auto growth operations, and therefore bad default settings of your database. When a query tries to read/write a protected data structure and has to wait for a latch, the query is always put into the suspended state, and has to wait until the latch can be acquired successfully. Therefore the query always goes through the complete query life cycle consisting of the states RUNNING, SUSPENDED, RUNNABLE, and finally RUNNING again. For that reason, enforcing the protection of a shared data structure only makes sense when the query holds the latch for a long time. That is because changing the state of the query also means performing a context switch in the Windows OS, which is a very expensive operation in terms of introduced CPU cycles.

Therefore it doesn’t make sense to put a latch in front of a shared data structure which will be read or written very frequently and only for a very short amount of time. In that case the needed context switches will kill the overall performance of SQL Server, and it would take too much time to go through the complete query life cycle (RUNNING, SUSPENDED, RUNNABLE). That’s the area where SQL Server introduces so-called Spinlocks. The Lock Manager is a good example of such a data structure: it needs single-threaded access when locking and unlocking data items (like a record, a page, etc.). But when you look at sys.dm_os_latch_stats, you will find no latch that protects the Lock Manger itself. The corresponding hash bucket in the hashtable used by the Lock Manager is protected by a spinlock – the LOCK_HASH spinlock. The spinlock must be acquired before executing a lock or unlock operation through the Lock Manager. But today I am not going to talk about spinlocks, because I plan a dedicated blog posting just about them – so you will have to wait for that:-)

Summary

In this blog posting we have looked at latches in SQL Server. As you have seen latches are lightweight synchronization objects used by SQL Server to protect shared data structures in memory. SQL Server distinguishes between 3 different types of latches – IO Latches, Buffer Latches, and Non-Buffer Latches. You have also seen how you can troubleshoot latch waits with DMVs like sys.dm_os_wait_stats and sys.dm_os_latch_stats.

Thanks for reading & watching!

-Klaus  

Do you like this post? Why not share it with the SQL Server community?