Lock-Free Data Structures

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

A few weeks ago, I blogged about Latches and Spinlocks in SQL Server. Both synchronization primitives are used to protect shared data structures within SQL Server, like pages in the Buffer Pool (through Latches) or locks in the Lock Manager’s hashtable (through a Spinlock). A completely new synchronization paradigm that you will see more and more is so-called Lock-Free Data Structures. That’s also one of the foundations on which In-Memory OLTP in SQL Server 2014 is built. Therefore I want to give you in today’s blog posting a quick overview of and a general introduction to Lock-Free Data Structures and what they have to offer.

What are Lock-Free Data Structures?

A lock free algorithm protects a shared data structure through a non-blocking algorithm. As you have already seen in the previous blog postings about Latches and Spinlocks, other threads are blocked when they can’t acquire a Latch or a Spinlock. When a thread waits for a Latch, SQL Server puts the thread into the SUSPENDED state, if a thread waits for a Spinlock, the thread has to spin actively on the CPU. Both approaches lead to a blocking scenario, which we want to avoid through a Non-Blocking Algorithm. Wikipedia has a really nice explanation of Non-Blocking Algorithms:

“A non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress regardless of scheduling.”

The most important point to pick up from this explanation is that a thread will not be blocked by another thread. This is possible, because no traditional locks are used for the thread synchronization itself. Let’s have a more detailed look at a concrete example. 

Spinlock Implementation

Let’s go through this code step-by-step. First of all, the implementation of the function compare_and_swap is implemented through one atomic hardware instruction directly on the CPU level – CMPXCHG. I just wanted to illustrate what logic is implemented in CMPXCHG: you compare a value to an expected value, and if they are the same, the old value is set to a new value. Because the whole logic of CMPXCHG is implemented as one atomic unit on the CPU itself, no other thread can interrupt the execution of this assembly opcode.

To store the state of the spinlock itself, a variable with the name lock is used. Therefore a thread has to spin in the while loop until the spinlock synchronization variable is unlocked. If this happens the thread can lock the synchronization variable, and can finally enter the critical section in a thread-safe manner. This is again just a simplified (not thread-safe!) illustration of a Spinlock – things are a little bit harder and more complicated in reality.

The largest problem with this traditional approach is that there is a shared resource involved in the thread synchronization: the spinlock synchronization variable lock. If one thread holds the spinlock, and gets suspended, all the other threads get stuck in the while loop when they try to acquire the spinlock. You can avoid this problem by introducing a lock free coding technique.

A Lock-Free Approach

As you can see, the implementation of the method Foo has completely changed. Instead of trying to acquire a spinlock, the implementation just checks if some other thread has modified the shared variable (earlier protected through a Spinlock), before the atomic addition is performed. This means that there is no shared resource used anymore, and threads are not blocking each other anymore. That’s the main idea of Lock-Free Data Structures and Non-Blocking Algorithms.

In-Memory OLTP in SQL Server 2014 uses the same approach to install page changes in the mapping table of the Bw-Tree. Therefore there is no locking, latching, and spinlocking involved. If In-Memory OLTP “sees” that the address of a page in the mapping table has changed, it means that another thread has started a modification on that page – but hasn’t completed it yet (some other thread was scheduled in the meantime on the CPU). The individual threads in In-Memory OLTP are working together in a cooperative fashion. Therefore it is possible that the thread that has seen the modification in the mapping table just completes the “pending” operation – like a page split.

A page split in In-Memory OLTP consists of multiple atomic operations. Therefore one thread can begin a page split, and another thread can finally finish this page split. In a future blog posting I will also talk a little bit more about these types of page splits, and which changes are implemented in the Bw-Tree to make this sophisticated approach possible.

Summary

In today’s blog posting I have introduced the main idea behind Lock-Free Data Structures to you. The main idea is to check if other threads have already done an operation before itself trying to execute an atomic operation. Therefore there is no longer any need to protect a critical section through a synchronization primitive like a spinlock. And the idea of Lock-Free Data Structures and Non-Blocking Algorithms is also used by In-Memory OLTP that was first introduced with SQL Server 2014.

Thanks for reading and your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Grouping Sets: the CUBE and ROLLUP subclauses

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In my last blog posting I talked about the Grouping Set functionality in SQL Server. As you have seen with the examples from last week, it is very easy to perform individual groupings by simply defining the necessary grouping sets. But what if you want to have all possible groups from a given set of columns – the so-called Power Set?

Of course, you can generate the power set manually with the syntax of the grouping set functionality, but that’s still a lot of code to write. Therefore I want to show you today 2 subclauses that the grouping set functionality supports: the CUBE and ROLLUP subclauses.

CUBE Subclause

With the CUBE subclause you can generate all possible grouping sets for a given set of columns. This is the so-called Power Set. When you have the 3 columns a, b, and c, CUBE (a, b, c) produces the following groupings for you:

  • (a, b, c)
  • (a, b)
  • (a, c)
  • (b, c)
  • (a)
  • (b)
  • (c)
  • ()

The following query generates the power set for the columns CustomerID, SalesPersonID, and YEAR(OrderDate) manually through the grouping sets functionality introduced last week. 

As you can see from the code, you have to specify every possible combination. So it’s again a hard, cumbersome job to write such a query with a simple requirement. Things get much easier if you use the CUBE subclause instead of specifying individual grouping sets. Let’s have a look at the following code.

As you can see from the code, you only specify the columns, and SQL Server itself generates the power set out of it. That’s very simple, straightforward code compared with the first listing.

ROLLUP Subclause

Besides the CUBE subclause, SQL Server supports also the ROLLUP subclause since SQL Server 2008. With the ROLLUP subclause you are able to define a subset of the power set. The ROLLUP subclause also assumes a hierarchy between the individual columns. Imagine that we have again our 3 columns a, b, and c. When you use ROLLUP(a, b, c), it produces the following grouping sets:

  • (a, b, c)
  • (a, b)
  • (a)
  • ()

You can very easily see from these individual grouping sets that there is a hierarchy between these columns. Just substitute the columns a, b, and c with columns like OrderYear, OrderMonth, and OrderDate, and you will get the idea of the kind of analytical queries you can perform here. It’s the poor-man implementation of SSAS (SQL Server Analysis Services). Let’s have a look at the following query that uses the ROLLUP subclause.

The output of that query gives you the following individual grouping sets:

  • (OrderYear, OrderMonth, OrderDay)
  • (OrderYear, OrderMonth)
  • (OrderYear)
  • ()

The ROLLUP subclause has a very simple syntax, but the output that you are getting back is really powerful for data analysis!

Summary

I hope that you have enjoyed today’s blog posting about the CUBE and ROLLUP subclause introduced with the grouping sets functionality in SQL Server 2008. If you have time you can also leave a short comment, and tell me if you already use these subclauses in your databases, or if you think that they make sense in your environment.

Like or share this blog posting to get the source code.

Thanks for reading and your time!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

The power of Grouping Sets in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Have you ever wanted to perform aggregations across multiple columns/dimensions in SQL Server without licensing SSAS (SQL Server Analysis Services)? I am not talking about using the Developer edition in production, nor am I talking about installing a pirate copy of SQL Server.

Mission Impossible? Not really, because it’s possible through the so-called Grouping Sets in SQL Server. I want to give you in this blog posting an overview about grouping sets, what kind of queries you can achieve with them, and what are their performance advantages.

Aggregations with Grouping Sets

Imagine you have an orders table, and you want to perform T-SQL query aggregations across multiple groups. In the context of the Sales.SalesOrderHeader table of the AdventureWorks2012 database, these groupings can be something like the following:

  • A grouping across “everything”
  • GROUP BY SalesPersonID, YEAR(OrderDate)
  • GROUP BY CustomerID, YEAR(OrderDate)
  • GROUP BY CustomerID, SalesPersonID, YEAR(OrderDate)

When you want to perform these individual groupings with a traditional T-SQL query, you need multiple statements, where you perform a UNION ALL between the individual result sets. Let’s have a look at such a query:

The approach used by this T-SQL statement has multiple disadvantages:

  • The T-SQL statement itself is huge, because every individual group is one distinct query.
  • The table Sales.SalesOrderHeader has to be accessed 4 times – once for every distinct query.
  • When you look at the execution plan, you can see that SQL Server performs an Index Seek (NonClustered) operation 4 times – once for every query.

You can dramatically simplify the T-SQL code that you need, if you use the grouping sets functionality introduced back in SQL Server 2008. The following code shows you the same query, but this time implemented with grouping sets.

As you can see from the code itself, you just specify the needed grouping sets inside the GROUP BY GROUPING SETS clause – everything else is performed transparently by SQL Server. The empty parentheses specify the so-called Empty Grouping Set, the aggregation across the whole table. When you also look at the output of STATISTICS IO, you can see that the table Sales.SalesOrderHeader was accessed only once! That’s a huge difference from the previous manual implementation that we have performed.

Within the execution plan SQL Server uses a Table Spool operator that stores the retrieved data temporarily in TempDb. The data from the work table created in TempDb is afterwards used in the second branch of the execution plan. Therefore the data isn’t rescanned for every group from the base table, which leads to a better performance of the whole execution plan.

Query plan that calculates Grouping Sets

When you look at the execution plan, you can also see that the query plan contains 3 Stream Aggregate operators (highlighted in red, blue, and green). These 3 operators are calculating the individual grouping sets:

  • The blue highlighted operator calculates the grouping set for CustomerID, SalesPersonID, and YEAR(OrderDate).
  • The red highlighted operator calculates the grouping set for SalesPersonID and YEAR(OrderDate). In addition it also calculates the grouping set across “everything”.
  • The green highlighted operator calculates the grouping set for CustomerID and YEAR(OrderDate).

The idea behind the 2 subsequent Stream Aggregate operators is to calculate so-called Super Aggregates – aggregations of aggregations.

Summary

In today’s blog posting I have introduced Grouping Sets to you, a T-SQL enhancement introduced back in SQL Server 2008. As you have seen grouping sets have 2 huge advantages: they simplify your code, and they improve the performance of the query by accessing the base data only once.

I hope that you have now got some good ideas about grouping sets, and it would be really interesting for me, if you can leave a short comment if you are able to use this functionality in your own databases.

Like or share this blog posting to get the source code.

Thanks for your time!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Forwarding Records on Heap Tables

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a very specific performance problem that can occur on heap tables: Forwarding Records. First of all we have to clarify what a heap table is: a heap table is just a table without a Clustered Index. It is very fast for inserting new records, and very slow when you are reading records. Reading records can introduce random I/O on your storage subsystem, and sometimes you can also encounter Forwarding Records, which will further slow down your read performance.

Why Forwarding Records?

SQL Server uses Forwarding Records when a record in a heap table has to be moved to a different physical location. Imagine you have a table with a variable length column. In the first step you insert records into the heap table, and you are not storing any data in the variable length column:

Imagine now what happens when you run an UPDATE statement against the variable length column. In that case SQL Server may have to expand the record, and because of the larger record size, other records must be moved away from the same data page.

In that case, SQL Server leaves a so-called Forwarding Record at the original location that points to the new location where the record is finally stored.

Forwarding Records in SQL Server

SQL Server has to use this approach to avoid having to update all of the Non-Clustered Indexes on the same table. As you might know, when you create a Non-Clustered Index on a heap table, at the leaf level the Non-Clustered Index points to the physical position where the record data is stored. Without Forwarding Records, all these pointers would have to be changed, and that would slow down your performance tremendously. In the following short flipchart demo, I want to show you how SQL Server introduces Forwarding Records on heap tables.

How to fix Forwarding Records?

To find out if a given heap table consists of Forwarding Records, you can use the Dynamic Management Function (DMF) sys.dm_db_index_physical_stats. When you call that function on a heap table and pass in the DETAILED mode, SQL Server returns the number of Forwarding Records on the table by way of the column forwarded_record_count.

As you can see, in our case we have two Forwarded Records on that table, out of four records! To get rid of these Forwarding Records you can rebuild your table.

Afterwards, when you check the Forwarding Record count again through the DMF sys.dm_db_index_physical_stats, you can see that they are gone. DBAs are always thinking about index fragmentation, index rebuilds, and index reorganize operations. But nobody thinks about Forwarding Records in heap tables. If you work with maintenance scripts it’s always a good idea to check the number of Forwarding Records on heap tables to make sure that you get the best possible performance.

Summary

In this blog posting you have seen how Forwarding Records on heap tables can slow down your reads, because of the additional logical reads that are needed. When I perform SQL Server Health Checks for my customers, I always check for heap tables in combination with Forwarding Records.

And trust me: there are a huge amount of heap tables and even Forwarding Records out in production, and DBAs who are not aware of the side-effects. As a general rule of thumb I always recommend creating Clustered Indexes on tables to get rid of Forwarding Records. Of course there are some extreme scenarios like the Last Page Insert Latch Contention where you can also introduce performance problems with this approach, but for most ordinary databases that approach will work very fine.

Like or share this blog posting to get the source code.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Personal SQL Server Coaching

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

I travel a lot. I’m away from home a lot. I spread the passion about SQL Server around the whole world through my conference talks, and my public trainings. Currently I am sitting in Cape Town/South Africa and writing these lines to you.

Because of my huge travel schedule and my passion about SQL Server, I want to try a new experiment together with you: I want to offer you a new exclusive service: personal SQL Server Coaching in your city!

You can think about the personal SQL Server Coaching as follows:

  • Because of my huge travel schedule, I see many different cities and countries in the course of the year.
  • When I’m in your city, I offer a dedicated 2 hour long SQL Server coaching slot in the evening (around 18:00 pm to 21:00 pm depending on your and my work schedule).
  • During these 2 hours we look at a specific SQL Server related performance problem from your SQL Server environment. Together with you I will provide you concrete guidance, ideas, and solutions on how you can resolve your specific performance problem.
  • As a prerequisite you have to bring your performance problem in a reproducible way to our SQL Server coaching session (e.g. prepared on a notebook).

I can offer the following SQL Server Coaching slots during the rest of 2014:

  • Dublin/Ireland: September 18
  • Antwerp/Belgium: September 30
  • Utrecht/Netherlands: October 3
  • Munich/Germany: October 6, October 7, October 9
  • Zurich/Switzerland: October 14, October 15
  • London/UK: October 27, October 28, October 29
  • Seattle/USA: The week from November 3 – November 7
  • Karlsruhe/Germany: November 12, November 13
  • Utrecht/Netherlands: November 18, November 19
  • Antwerp/Belgium: November 25, November 26, November 27

If you are interested in personal SQL Server coaching, please contact me for further information, pricing, and availability of your preferred slot.

Be part of a new experiment, and feel the passion about SQL Server!

Thanks for reading.

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #17 – Round Robin Allocations

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 17th SQL Server Quickie to YouTube. This time I’m talking about Round Robin Allocations operations in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

The dangerous beauty of the PIVOT operator in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to discuss one specific T-SQL language construct in SQL Server – the PIVOT operator that was introduced back with SQL Server 2005. I always refer to this language construct as one of the most dangerous ones in SQL Server – and you will very soon see why. Before we get into some specific problems and pitfalls, I first want to give you a basic overview of what can be accomplished with pivoting in SQL Server.

Overview

The basic idea behind the PIVOT operator in SQL Server is that you can rotate rows into columns during a T-SQL query. The operator itself was introduced back with SQL Server 2005 and is mainly used for databases that are built on the principles of an Entity Attribute Value model (EAV). The idea behind an EAV model is that you can extend database entities without performing database schema changes. For that reason an EAV model stores all attributes of an entity in one table as key/value pairs. Let’s have a look at the following table that models a simple key/value pairs table.

As you can see, I have inserted 2 database entities into that table, and every entity consists of multiple attributes. Every attribute is just an additional record in that table. If you want to extend the entity with additional attributes, you need to insert additional records into that table but there is no need to perform a database schema change – that’s the “power” of an open database schema…

Querying such an EAV table is of course very hard, because you are dealing with a flat key/value pair structure. For that reason you want to rotate the table contents, and rotate rows into columns. You can perform this rotation natively with the PIVOT operator, or completely manually through a traditional CASE expression. Before we get down to the nasty details of the PIVOT operator I want to show you how you can perform the rotation by hand with T-SQL and some CASE expressions. If you perform the rotation manually, your T-SQL query has to implement 3 different phases:

  1. Grouping Phase
  2. Spreading Phase
  3. Aggregation Phase

In the Grouping Phase we compact our EAV table into distinct database entities. In our case we perform a GROUP BY on the column RecordID. In the 2nd phase, the Spreading Phase, we use multiple CASE expressions to rotate rows into columns. And finally we use the MAX expression in the Aggregation Phase to return one distinct value for every row and column. Let’s have a look at the following T-SQL code.

As you can see from the code, it is very easy to identify each phase, and how they are mapped to the T-SQL query. The following picture shows you the result of the query, where we have finally rotated rows into columns.

Manually pivoted data

The PIVOT operator

Beginning with SQL Server 2005 (almost 10 years ago!), Microsoft introduced the PIVOT operator in T-SQL. With that operator you can perform the same transformation (rows to columns) with just one native operator. Sounds very easy and promising, doesn’t it? The following listing shows how you can perform the same transformation natively with the PIVOT operator.

When you run that query, you will get back the same result as seen in the previous picture. But when you look at the syntax of the PIVOT operator, you will see one big difference compared to the manual approach:

You can only specify the spreading and aggregation elements! There is no way to define the grouping elements explicitly!

The grouping elements are just the remaining columns that you haven’t referenced in the PIVOT operator. In our case we haven’t referenced the column RecordID in the PIVOT operator, therefore this column is used during the Grouping Phase. This leads to interesting side effects if we subsequently change the database schema, for example by adding additional columns to the underlying base table:

When you now run the same query with the PIVOT operator again (make sure you have non NULL values in that column), you will get back a completely different result, because the Grouping Phase is now done on the columns RecordID and SomeData (which we have just added).

The native PIVOT operator can lead to a wrong result

Contrast that with what happens if we re-execute the manually T-SQL query that we wrote to start with. It still returns the same correct result. That’s one of the biggest negative side effects of the PIVOT operator in SQL Server: the grouping elements can’t be defined explicitly. To overcome this problem, it is a best practice to use a table expression and only return the necessary columns from the table. With this approach you will also have no problems if you change the table schema at a later stage, because the additional columns are not returned (by default) from the table expression. Let’s have a look at the following listing.

As you can see from this code, I’m feeding the PIVOT operator through a table expression. And within the table expression you are only selecting the needed columns from the base table. This then means that you can change the table schema in the future without breaking the result of the PIVOT query.

Summary

I hope that this blog posting has shown you why the PIVOT operator can be very dangerous in SQL Server. The syntax itself leads to very efficient code, but as a side effect you can’t specify the grouping elements directly. Therefore you should always make sure to use a table expression to define exactly which columns are fed into the PIVOT operator to make the given result deterministic.

What is your experience with the PIVOT operator? Do you like it or not? If you don’t like it, what would you want to have changed?

Like or share to get the source code.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Database Shrink: NOTRUNCATE vs. TRUNCATEONLY

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Two weeks ago I published my latest SQL Server Quickie, which covered the basics of Database Shrink operations in SQL Server, and why you should never ever use them. After the release of the Quickie, I got really good feedback on it, and one interesting question was about the additional NOTRUNCATE and TRUNCATEONLY options that are available. Therefore I want to take my time in this blog posting to discuss both options in more detail. Before you move on, please be sure to watch the SQL Server Quickie to have a basic understanding about what happens to your data file when you run a database shrink operation.

NOTRUNCATE

When you provide the option NOTRUNCATE to the database shrink command, SQL Server performs the shrink operation as I have described on the flipchart in the Quickie. This means that SQL Server moves pages from the end of your data file towards the beginning, where some free unallocated space is found. The main difference is that the data file itself is not shrunk.

This means that the unused space at the end of the data file is not reclaimed back in the file system. The physical size of your data file is the same as before the operation. After performing a database shrink with the NOTRUNCATE option you end up with some free space available at the end of your data file – and of course the index fragmentation that is introduced.

TRUNCATEONLY

When you run the database shrink command with the TRUNCATEONLY option, SQL Server truncates the data file from the end towards the beginning as long as you have some free unallocated space at the end of the data file. With this option SQL Server doesn’t move any pages within the data file. You yourself must make sure that there is some free space available at the end of your data file, so that TRUNCATEONLY can reclaim some space for you.

If you have read very carefully up to this point, you can now already imagine what happens under the hood when you run a regular database shrink operation without providing additional options:

  • In the first step SQL Server runs the database shrink operation with the NOTRUNCATE option to gain some free space at the end of your data file. Nothing happens here to the physical size of your file in the Windows file system. This step also introduces the index fragmentation.
  • In the second step SQL Server runs the database shrink operation with the TRUNCATEONLY option, and finally removes the unallocated extents at the end of the data file. It also shrinks down the physical size of the file in the Windows file system.

Summary

It doesn’t matter which options the database shrink command provides to you: a database shrink is never ever recommended! If you know that you have unallocated space at the end of your data file that you want to get rid of, it’s much safer to run a database shrink with the option TRUNCATEONLY, because you don’t introduce index fragmentation. But as soon as you use the option NOTRUNCATE, SQL Server has to move pages inside your data file, and this introduces serious index fragmentation.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #16 – Database Shrink Operations

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 16th SQL Server Quickie to YouTube. This time I’m talking about Database Shrink operations in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Why do we need UPDATE Locks in SQL Server?

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I want to talk about a specific question that I almost get every time when I teach about Locking & Blocking in SQL Server: Why does SQL Server need to have Update Locks? Before we go down to the details of why they are needed, I first want to give you a basic overview of when an Update (U) Lock is acquired, and how the lock itself behaves regarding its compatibility.

In general an Update Lock is used in SQL Server when performing an UPDATE statement. When you look at the underlying query plan, you can see that such a plan always consists of 3 parts:

  • Reading data
  • Calculating new values
  • Writing data

Update Query Plan

When SQL Server initially reads the data to be changed in the first part of the query plan, Update Locks are acquired on the individual records. And finally these Update Locks are converted to Exclusive (X) Locks when the data is changed in the third part of the query plan. The question that arrises with this approach is always the same: why does SQL Server acquire Update Locks instead of Shared (S) Locks in the first phase? When you normally read data through a SELECT statement, a Shared Lock is also good enough. Why is there now a different approach with UPDATE query plans? Let’s have a more detailed look at it.

Deadlock Avoidance

First of all UPDATE Locks are needed to avoid deadlock situations in UPDATE query plans. Let’s try to imagine what happens when multiple UPDATE query plans acquire Shared Locks in the first phase of the plan, and afterwards convert these Shared Locks to Exclusive Locks when the data is finally changed in the third phase of the query plan:

  • The 1st query can’t convert the Shared Lock to an Exclusive Lock, because the 2nd query has already acquired a Shared Lock.
  • The 2nd query can’t convert the Shared Lock to an Exclusive Lock, because the 1st query has already acquired a Shared Lock.

That approach would lead to a traditional deadlock situation in a relational database:
Update Deadlock
That’s one of the main reasons why implementers of relational database engines have introduced Update Locks to avoid that specific deadlock situation. An Update Lock is only compatible with a Shared Lock, but isn’t compatible with another Update or Exclusive Lock. Therefore a deadlock situation can be avoided, because 2 UPDATE query plans can’t run concurrently at the same time. The 2nd query will just wait until the Update Lock can be acquired in the 1st phase of the query plan. An unpublished study of System R also showed that this kind of deadlock was the most prominent one. System R was initially implemented without any Update Locks.

Improved Concurrency

Instead of acquiring an Update Lock during the 1st phase, it would be also a viable option to acquire an Exclusive Lock directly in that phase. This will also overcome the deadlock problem, because an Exclusive Lock is not compatible with another Exclusive Lock. But the problem with that approach is limited concurrency, because in the mean time no other SELECT query can read the data that is currently exclusively locked. Therefore there is also the need for the Update Lock, because this specific lock is compatible with the traditional Shared Lock. As a result this means that other SELECT queries can read data, as long as individual Update Locks are not yet converted to Exclusive Locks. As a side-effect this will improve the concurrency of our parallel running queries. 

In traditional relational literature an Update Lock is a so-called Asymmetric Lock. In the context of the Update Lock that means that the Update Lock is compatible with the Shared Lock, but not vice-versa: the Shared Lock is not compatible with the Update Lock. But SQL Server doesn’t implement the Update Lock as an asymmetric one. The Update Lock is a symmetric one, which means that Update and Shared Locks are compatible in both directions. This will also improve the overall concurrency of the system, because it doesn’t introduce blocking situations between both lock types.

Summary

In todays blog posting I gave you an overview of Update Locks in SQL Server, and why they are needed. As you have seen there is a really strong need for Update Locks in a relational database, because otherwise it would yield to deadlock situations and decreased concurrency. I hope that you now have a better understanding of Update Locks, and how they are used in SQL Server.

Thanks for reading!

-Klaus

Do you like this post? Why not share it with the SQL Server community?