Why do we need UPDATE Locks in SQL Server?

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I want to talk about a specific question that I almost get every time when I teach about Locking & Blocking in SQL Server: Why does SQL Server need to have Update Locks? Before we go down to the details of why they are needed, I first want to give you a basic overview of when an Update (U) Lock is acquired, and how the lock itself behaves regarding its compatibility.

In general an Update Lock is used in SQL Server when performing an UPDATE statement. When you look at the underlying query plan, you can see that such a plan always consists of 3 parts:

  • Reading data
  • Calculating new values
  • Writing data

Update Query Plan

When SQL Server initially reads the data to be changed in the first part of the query plan, Update Locks are acquired on the individual records. And finally these Update Locks are converted to Exclusive (X) Locks when the data is changed in the third part of the query plan. The question that arrises with this approach is always the same: why does SQL Server acquire Update Locks instead of Shared (S) Locks in the first phase? When you normally read data through a SELECT statement, a Shared Lock is also good enough. Why is there now a different approach with UPDATE query plans? Let’s have a more detailed look at it.

Deadlock Avoidance

First of all UPDATE Locks are needed to avoid deadlock situations in UPDATE query plans. Let’s try to imagine what happens when multiple UPDATE query plans acquire Shared Locks in the first phase of the plan, and afterwards convert these Shared Locks to Exclusive Locks when the data is finally changed in the third phase of the query plan:

  • The 1st query can’t convert the Shared Lock to an Exclusive Lock, because the 2nd query has already acquired a Shared Lock.
  • The 2nd query can’t convert the Shared Lock to an Exclusive Lock, because the 1st query has already acquired a Shared Lock.

That approach would lead to a traditional deadlock situation in a relational database:
Update Deadlock
That’s one of the main reasons why implementers of relational database engines have introduced Update Locks to avoid that specific deadlock situation. An Update Lock is only compatible with a Shared Lock, but isn’t compatible with another Update or Exclusive Lock. Therefore a deadlock situation can be avoided, because 2 UPDATE query plans can’t run concurrently at the same time. The 2nd query will just wait until the Update Lock can be acquired in the 1st phase of the query plan. An unpublished study of System R also showed that this kind of deadlock was the most prominent one. System R was initially implemented without any Update Locks.

Improved Concurrency

Instead of acquiring an Update Lock during the 1st phase, it would be also a viable option to acquire an Exclusive Lock directly in that phase. This will also overcome the deadlock problem, because an Exclusive Lock is not compatible with another Exclusive Lock. But the problem with that approach is limited concurrency, because in the mean time no other SELECT query can read the data that is currently exclusively locked. Therefore there is also the need for the Update Lock, because this specific lock is compatible with the traditional Shared Lock. As a result this means that other SELECT queries can read data, as long as individual Update Locks are not yet converted to Exclusive Locks. As a side-effect this will improve the concurrency of our parallel running queries. 

In traditional relational literature an Update Lock is a so-called Asymmetric Lock. In the context of the Update Lock that means that the Update Lock is compatible with the Shared Lock, but not vice-versa: the Shared Lock is not compatible with the Update Lock. But SQL Server doesn’t implement the Update Lock as an asymmetric one. The Update Lock is a symmetric one, which means that Update and Shared Locks are compatible in both directions. This will also improve the overall concurrency of the system, because it doesn’t introduce blocking situations between both lock types.

Summary

In todays blog posting I gave you an overview of Update Locks in SQL Server, and why they are needed. As you have seen there is a really strong need for Update Locks in a relational database, because otherwise it would yield to deadlock situations and decreased concurrency. I hope that you now have a better understanding of Update Locks, and how they are used in SQL Server.

Thanks for reading!

-Klaus

SQLBits recap

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Wow, what an intensive last week! Last week I attended the SQLBits conference in Telford/UK – in the middle of nowhere :-) . Telford itself is around 30 minutes in the west of Birmingham (where I flew into), but there is almost no civilization – or I didn’t have found it…It was the first conference in my life where I haven’t seen anything else besides the hotel and the conference venue. This makes life very simplified.

During the conference I have presented 2 sessions. On Friday I have talked about Latches, Spinlocks, and Latch-Free Data Structures, and on Saturday I have presented my session about the Dangerous Beauty of Bookmark Lookups. Both sessions where high attended, and I had a huge fun to present them.

I have also already submitted my session materials (slides & samples) to the conference owners, so you should be able very soon to get the materials through the conference website. Besides the amazing speaker lineup, we also had a lot of fun at the Steampunk party on Friday, which was THE highlight during the conference! I can’t tell you too much about the party itself, because what happens at SQLBits stays in SQLBits ;-) .

I got pink hairs!

Thanks for reading

-Klaus

My upcoming speaking schedule

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Over the next weeks and months I have an incredible speaking schedule around the whole world. You have plenty of opportunities to meet me in person, and discuss performance related topics in SQL Server.

My first stop is this week in Telford/UK for the SQLBits conference. I have been speaking at SQLBits for ages – I think my first conference participation was around 2009. On Friday I speak about “Latches, Spinlocks & Latch Free Data Structures” in SQL Server. If you have followed my blog postings over the last weeks, you can already see what to expect in this session. I will also show you a concrete example of Spinlock Contention in SQL Server, how you can troubleshoot, and finally solve it. On Saturday I have an additional session where I talk about “The Dangerous Beauty of Bookmark Lookups”. Bookmark Lookups are very powerful in SQL Server, but have a huge amount of side-effects. In this session we will concentrate on these side effects, and I will show you how you can overcome the various problems.

At the beginning of September I’m traveling for the first time in my life to the other hemisphere of the Earth – my trip brings me to Cape Town in South Africa, where I’m speaking at the SQLSaturday #301. On Friday, September 5 I give a whole day-long precon about “Practical SQL Server Performance Troubleshooting“, the precon that I have delivered in 2012 at the SQLPASS Summit in Seattle. In this precon we will work with a typical OLTP workload (based on the TPC-E benchmark), and work throughout the whole day to identify performance bottlenecks in SQL Server, and finally solve them – completely transparent to the workload itself. Be sure to register for my precon.

Two weeks after Cape Town, I’m flying to Dublin and delivering a precon and a regular session at the SQLSaturday #310 (September 19 – 20). My precon is about “Performance enhancements in SQL Server 2014“. Large parts are about the new cardinality estimator, and of course I will also talk about In-Memory OLTP and “Clustered” Column-Store Indexes. Dublin is always fun, and the recovery process afterwards is always very long – there is no fast recovery available ;-)

For the first time in my life I have also the honor to be a speaker at the SQL Server Days in Belgium (September 30 – October 1). I have already heard a lot of good things about this conference, so I’m really looking forward to be part of it, and delivering a session. My session will be about “Latches, Spinlocks and Latch Free Data Structures” – the same that I deliver this week in Telford/UK. If you miss SQLBits, there is still a chance to see this session once again in Europe.

The week after Belgium I will present at the SQLdays Conference in Munich/Germany (October 7 – 9), where I will deliver on Tuesday the same precon as in Cape Town – “Practical SQL Server Performance Troubleshooting“. I’m already looking forward to this conference, because I have known the owners for ages, and it’s a very nice conference in Germany.

At the beginning of November I travel to the west and flying to Seattle for the SQLPASS Summit, where I will also deliver my Latches, Spinlocks, and Latch Free Data Structures session. For me it’s the 7th participation as a speaker (I delivered my 1st session in 2006), so I’m already feeling like an old-timer. Maybe it is a coincidence that I’m losing hairs, and turning grey… ;-) The SQLPASS Summit is always an amazing experience, because it’s the largest SQL Server conference in the universe (assuming that only we on earth use SQL Server). I’m already looking forward to meeting all my friends and Twitter followers again in person.

As you can see I have an amazing and very time consuming speaking schedule over the next weeks and months. But that’s not all. In addition to my conference participations I also deliver 5 SQLpassion workshops in the autumn. I’m very proud to deliver my brand new SQL Server Query Tuning Workshop 4 times across Europe. This workshop covers the query tuning aspect of SQL Server, without going into DBA related topics. The main audience for the workshop is a SQL Server developer. I talk for 4 days about execution plans, execution plans, and execution plans. If you want to know the nasty details about SQL Server, I have my more in-depth SQL Server Performance Tuning & Troubleshooting Workshop, which covers all aspects of how to tune your complete SQL Server installation. It’s a more DBA centric course. Here are the dates and locations for my SQLpassion workshops:

  • SQL Server Query Tuning Workshops
    • October 13 – 16 in Zurich/Switzerland
    • October 20 – 23 in London/UK
    • October 27 – 30 in Vienna/Austria
    • November 17 – 20 in Utrecht/Netherlands
  • SQL Server Performance Tuning & Troubleshooting Workshop
    • November 24 – 28 in Antwerpen/Belgium

Besides that I’m using the quiet summer time for the various session preparations, and of course for preparing my new SQL Server Query Tuning Workshop. But my life doesn’t consist only of SQL Server. Besides my SQL Server business I run my Boeing 737-800 flight simulator, and I do a huge amount of DSLR photo/video shooting. I just ordered some books about Astrophotography and Time Lapsing. Imagine what you can expect when you combine both things together. Do you have ever seen stars during the day?

Stars in the Sky

And because I’m crazy I have started to run about a month ago. I already make a very good progress, and I’m hopefully able to complete my first 5k run in a few weeks!

Thanks for reading!

-Klaus

SQL Server Quickie #15 – Merge Join Operator

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 14th SQL Server Quickie to YouTube. This time I’m talking about the Merge Join Operator in SQL Server. You can find the script used during the demonstration here as a download.

Thanks for watching!

-Klaus

Introduction to Spinlocks in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In last weeks blog posting I have talked about Latches in SQL Server. At the end I have also introduced a little bit the idea of spinlocks to you. Based on that foundation I will continue today the discussion about spinlocks in SQL Server, and will show you how you can troubleshoot them.

Why do we need Spinlocks?

As I have already pointed it out last week, it doesn’t make sense to put a latch in front of every shared data structure, and synchronize the access to the data structure across multiple threads with the latch. A latch has a huge overhead associated with it: when you can’t acquire a latch (because someone else has already an incompatible latch acquired), the query is forced to wait, and enters the SUSPENDED state. The query waits in the SUSPENDED state until the latch can be acquired, and afterwards moves on into the RUNNABLE state. The query remains in the RUNNABLE state as long as no CPU is available for query execution. As soon as the CPU is free, the query moves again into the RUNNING state and can finally access the shared data structure which is protected with the latch, which was successfully acquired. The following picture shows the state machine that SQLOS implements for the cooperative thread scheduling.

Cooperative Thread Scheduling.png

Because of the associated overhead of latches, it doesn’t make sense to protect “busy” data structures with a latch. For that reason SQL Server also implements so-called Spinlocks. A spinlock is like a latch a lightweight synchronization object used by the storage engine to synchronize thread access to shared data structures. The main difference to a latch is that you actively wait for the spinlock – without leaving the CPU. A “wait” on a spinlock always happens on the CPU, in the RUNNING state. You spin in a tight loop until the spinlock is acquired. It’s a so-called busy wait. The main advantage of a spinlock is that no context switches are involved when a query must wait on a spinlock. On the other hand busy waiting wastes CPU cycles that other queries might be able to use more productively.

To avoid waisting too much CPU cycles, SQL Server 2008 R2 and higher implements a so-called exponential backoff mechanism, where the thread stops spinning after some time and sleeps on the CPU. The interval after which a thread goes to sleep increases over time between the attemps to acquire the spinlock. This behavior can reduce the impact on CPU performance.

Spinlocks & Troubleshooting

The main DMV for troubleshooting spinlocks is sys.dm_os_spinlock_stats. Every row that is returned by that DMV represents one specific spinlock in SQL Server. SQL Server 2014 implements 262 different spinlocks. Let’s have a more detailed look at the various columns in this DMV.

  • name: The name of the spinlock
  • collisions: The number of times that threads were blocked by a spinlock when trying to access a protected data structure
  • spins: The number of times threads spinned in a loop trying to obtain the spinlock
  • spins_per_collision: Ratio between spins and collisions
  • sleep_time: The time that threads were sleeping because of a backoff
  • backoffs: The number of times that threads were backed-off to allow other threads to continue on the CPU

The most important column in this DMV is backoffs, because this column tells you how often a backoff event occurred for a specific spinlock type. And very high backoffs yield to high CPU consumption and a so-called Spinlock Contention in SQL Server. I have already seen SQL Server installations where 32 cores were running at 100% without performing any work – a typical symptom for spinlock contention.

To troubleshoot a spinlock contention problem in more detail you can use the XEvent sqlos.spinlock_backoff provided by Extended Events. This event is always raised when a backoff occurs. If you capture this event, you also have to make sure that you use a very good selective predicate, because backoffs will always occur in SQL Server. A good predicate can be a specific spinlock type, where you have already seen high backoffs through the above mentioned DMV. The following code sample shows how you can create such an XEvent session.

-- Retrieve the type value for the LOCK_HASH spinlock.
-- That value is used by the next XEvent session
SELECT * FROM sys.dm_xe_map_values
WHERE name = 'spinlock_types'
AND map_value = 'LOCK_HASH'
GO

-- Tracks the spinlock_backoff event
CREATE EVENT SESSION SpinlockContention ON SERVER 
ADD EVENT sqlos.spinlock_backoff
(
    ACTION
	(
		package0.callstack
	)
	WHERE
	(
		[type] = 129 -- <<< Value from the previous query
	)
) 
ADD TARGET package0.histogram
(
	SET source = 'package0.callstack', source_type = 1
)
GO

As you can see from the listing, I use here the histogram target to bucketize on the callstack. Therefore you can see which code path within SQL Server generated the highest backoffs for the specific spinlock type. You can even symbolize the call stack by enabling trace flag 3656. As a prerequisite you need to install the public symbols of SQL Server. Paul Randal (Blog, Twitter) has written a blog posting about it. Here you can see an output from this XEvent session.

sqldk.dll!XeSosPkg::spinlock_backoff::Publish+0×138
sqldk.dll!SpinlockBase::Sleep+0xc5
sqlmin.dll!Spinlock<129,7,1>::SpinToAcquireWithExponentialBackoff+0×169
sqlmin.dll!lck_lockInternal+0×841
sqlmin.dll!XactWorkspaceImp::GetSharedDBLockFromLockManager+0x18d
sqlmin.dll!XactWorkspaceImp::GetDBLockLocal+0x15b
sqlmin.dll!XactWorkspaceImp::GetDBLock+0x5a
sqlmin.dll!lockdb+0x4a sqlmin.dll!DBMgr::OpenDB+0x1ec
sqlmin.dll!sqlusedb+0xeb
sqllang.dll!usedb+0xb3
sqllang.dll!LoginUseDbHelper::UseByMDDatabaseId+0×93
sqllang.dll!LoginUseDbHelper::FDetermineSessionDb+0x3e1
sqllang.dll!FRedoLoginImpl+0xa1b
sqllang.dll!FRedoLogin+0x1c1
sqllang.dll!process_request+0x3ec
sqllang.dll!process_commands+0x4a3
sqldk.dll!SOS_Task::Param::Execute+0x21e
sqldk.dll!SOS_Scheduler::RunTask+0xa8
sqldk.dll!SOS_Scheduler::ProcessTasks+0×279
sqldk.dll!SchedulerManager::WorkerEntryPoint+0x24c
sqldk.dll!SystemThread::RunWorker+0x8f
sqldk.dll!SystemThreadDispatcher::ProcessWorker+0x3ab
sqldk.dll!SchedulerManager::ThreadEntryPoint+0×226

With the provided call stack, it is not that hard to identify in which area of SQL Server the spinlock contention occurred. In that specific call stack the contention occurred in the LOCK_HASH spinlock type that protects the hashtable of the lock manager. Every time when a lock or unlock operation in the lock manager is executed, a spinlock must be acquired on the corresponding hash bucket. As you can also see from the call stack the spinlock was acquired when calling the function GetSharedDBLockFromLockManager from the class XactWorkspaceImp. It’s an indication that a shared database lock was tried to be acquired, when connecting to a database. And this finally yielded to a spinlock contention in the LOCK_HASH spinlock with very high backoffs.

If you attend my talk Latches, Spinlocks, and Lock Free Data Structures at SQLbits (Telford, UK) in 2 weeks or at the SQLPASS Summit in Seattle in November, I will also show you how you can reproduce this spinlock contention, how to troubleshoot it, and finally how you can resolve it.

Summary

In this blog posting you have learned more about spinlocks in SQL Server. In the first part we have discussed why SQL Server needs to implement spinlocks. As you have seen, with spinlocks it’s just cheaper to protect a “busy” shared data structure from concurrent thread access – like the lock manager. And in the second section we had a more detailed look on how you can troubleshoot spinlock contention in SQL Server, and how you can identify with a symbolized call stack the root cause of the problem.

Thanks for reading!

-Klaus

Introduction to Latches in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about a more advanced, low-level synchronization object used by SQL Server: Latches. A latch is a lightweight synchronization object used by the Storage Engine of SQL Server to protect internal memory structures that can’t be accessed in a true multi-threaded fashion. In the first part of the blog posting I will talk about why there is a need for latches in SQL Server, and in the second part I will introduce various latch types to you, and how you can troubleshoot them.

Why do we need Latches?

Latches were first introduced in SQL Server 7.0, when Microsoft first introduced row-level locking. For row-level locking it was very important to introduce a concept like latching, because otherwise it would give rise to phenomena like Lost Updates in memory. As I have stated above, a latch is a lightweight synchronization object used by the Storage Engine to protect memory structures used internally by SQL Server. A latch is nothing more than a so-called Critical Section in multi-threaded programming – with some differences.

In traditional concurrent programming, a critical section is a code path, which must always run single-threaded – with only one thread at a time. A latch itself is a specialized version of a critical section, because it allows multiple concurrent readers. In the context of SQL Server this means that multiple threads can concurrently read a shared data structure, like a page in memory, but writing to that shared data structure must be performed single-threaded.

A latch is used to cordinate the physical execution of multiple threads within a database, whereas a lock is used on a logical level to achieve the required isolation based on the chosen isolation level of the transaction. You, as a developer or DBA, can influence locks in some ways – e.g. through the isolation level, or also through the various lock hints that are available in SQL Server. A latch on the other hand, can’t be controlled in a direct way. There are no latch hints in SQL Server, and there is also no latch isolation level available. The following table compares locks and latches against each other.

Locks vs. Latches

As you can see from that table, latches also support more fine grained modes like Keep and Destroy. A Keep latch is mainly used for reference counting, e.g. when you want to know how many other latches are waiting on a specific latch. And the Destroy latch is the most restrictive one (it even blocks the KP latch), which is used when a latch is destroyed, e.g. when the Lazy Writer wants to free up a page in memory. The following table gives you an overview of the latch compatibility matrix in SQL Server.

Latch Compatibility Matrix

In the following short flipchart demo, I want to show you why latches are needed in SQL Server, and which phenomena would happen without them.

As you have seen in the previous flipchart demo, consistency can’t be achieved in SQL Server with locking alone. SQL Server still has to access shared data structures that are not protected by the lock manager, like the page header. And even other components within SQL Server that have single-threaded code paths are built on the foundation of latches. Let’s continue therefore now with the various latch types in SQL Server, and how you can further troubleshoot them.

Latch Types & Troubleshooting

SQL Server distinguishes between 3 different types of latches:

  • IO Latches
  • Buffer Latches (BUF)
  • Non-Buffer Latches (Non-BUF)

Let’s have a more detailed look at these 3 different variations. I/O Latches are used by SQL Server when outstanding I/O operations against pages in the Buffer Pool are done – when you read and write from/to your storage subsystem. For these I/O latches SQL Server reports a wait type that starts with PAGEIOLATCH_. You can see the waiting times introduced with these types of latches in the DMV sys.dm_os_wait_stats in the following picture.

PAGEIOLATCH

With these latches SQL Server makes sure that pages are not read concurrently multiple times into the buffer pool, and that pages are not discarded from the buffer pool, when they are currently accessed by a query. In addition to the I/O latches SQL Server also supports so-called Buffer Latches, which are used to protect the pages in the buffer pool from concurrent running threads. These are the latches that SQL Server uses to prevent Lost Updates in memory, as I have demonstrated in the previous flipchart demo. Without these kinds of latches, it would be possible to read and write a page concurrently in the buffer pool, which would give rise to corruption of the pages in main memory. SQL Server also reports the waits introduced by these latches with wait types starting with PAGELATCH_*. These wait types are again reported to you through the DMV sys.dm_os_wait_stats. The most important thing here is that you hit contention in main memory, when there is not the term IO in the wait type name.

PAGELATCH

And finally SQL Server internally uses so-called Non-Buffer Latches to protect shared data structures besides the buffer pool itself. SQL Server also reports these latches in the DMV sys.dm_os_wait_stats with wait types starting with LATCH_.

LATCVH

But the waits reported in this DMV for Non-Buffer Latches are just a summary view of all individual latches that SQL Server uses internally. You can find a further breakdown of the individual latches in a separate DMV – sys.dm_os_latch_stats:

Latch Statistics

SQL Server 2014 internally uses 163 latches to synchronize access to shared data structures. One prominent latch is FGCB_ADD_REMOVE, which protects the so-called File Group Control Block (FGCB) of a file group during certain operations like:

  • File growth (manually & automatically)
  • Adding/dropping a file from a file group
  • Recalculating proportional fill weightings
  • Cycling through a files of a file group during the round-robin allocation

When you see high waits on that specific latch, you mainly have problems with too many auto growth operations, and therefore bad default settings of your database. When a query tries to read/write a protected data structure and has to wait for a latch, the query is always put into the suspended state, and has to wait until the latch can be acquired successfully. Therefore the query always goes through the complete query life cycle consisting of the states RUNNING, SUSPENDED, RUNNABLE, and finally RUNNING again. For that reason, enforcing the protection of a shared data structure only makes sense when the query holds the latch for a long time. That is because changing the state of the query also means performing a context switch in the Windows OS, which is a very expensive operation in terms of introduced CPU cycles.

Therefore it doesn’t make sense to put a latch in front of a shared data structure which will be read or written very frequently and only for a very short amount of time. In that case the needed context switches will kill the overall performance of SQL Server, and it would take too much time to go through the complete query life cycle (RUNNING, SUSPENDED, RUNNABLE). That’s the area where SQL Server introduces so-called Spinlocks. The Lock Manager is a good example of such a data structure: it needs single-threaded access when locking and unlocking data items (like a record, a page, etc.). But when you look at sys.dm_os_latch_stats, you will find no latch that protects the Lock Manger itself. The corresponding hash bucket in the hashtable used by the Lock Manager is protected by a spinlock – the LOCK_HASH spinlock. The spinlock must be acquired before executing a lock or unlock operation through the Lock Manager. But today I am not going to talk about spinlocks, because I plan a dedicated blog posting just about them – so you will have to wait for that:-)

Summary

In this blog posting we have looked at latches in SQL Server. As you have seen latches are lightweight synchronization objects used by SQL Server to protect shared data structures in memory. SQL Server distinguishes between 3 different types of latches – IO Latches, Buffer Latches, and Non-Buffer Latches. You have also seen how you can troubleshoot latch waits with DMVs like sys.dm_os_wait_stats and sys.dm_os_latch_stats.

Thanks for reading & watching!

-Klaus  

SQL Server Performance Tuning & Troubleshooting Workshop in Antwerpen/Belgium

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

I’m very happy to announce today that I also run my famous SQL Server Performance Tuning & Troubleshooting Workshop from November 24 – 28 in Antwerpen/Belgium. Here’s the outline of the workshop:

“Are you an SQL Server DBA or developer and you have encountered performance related problems with SQL Server? Yesterday your SQL Server was running very smoothly, but today’s performance is very bad and you have no idea what the problem is nor how to solve it? Because of that reason, we are running our SQL Server Performance Tuning & Troubleshooting Workshop to help you to react in situations like the above.”

If you are interested in more information about the workshop, the detailed agenda, and the pricing, please have a look at the workshop website itself.

Thanks for reading!

-Klaus

Configuring the In-Memory OLTP File Group for High Performance

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In todays’ blog posting I want to talk about the memory optimized file group used by In-Memory OLTP to achieve durability, and how to configure it for high performance throughput. Before we go down to the details, I want to give you a brief overview of how In-Memory OLTP achieves durability with this specialized file group in your database.

Durability for In-Memory OLTP

One of the big misconceptions in the early days of In-Memory OLTP (formerly codenamed Hekaton) was the fact that people thought that In-Memory OLTP doesn’t provide ACID transactions – only ACI ones without the aspect of Durability. But this isn’t the truth, because every operation done in memory is fully logged in In-Memory OLTP. If your database crashes, In-Memory OLTP is able to bring your database and your memory optimized tables up to the state before the crash occurred. In-Memory OLTP logs every operation to the traditional transaction log of SQL Server. Because everything happens in non-durable memory based on the MVCC principles, In-Memory OLTP only logs redo log records, and no undo log records at any time, because an undo-phase will never ever happen during crash recovery with In-Memory OLTP. And to perform a rollback of an in-flight transaction, the previous version is always available.

Redo log records will also be only written when a transaction is committed. In addition the so-called Offline Checkpoint Worker populates successfully committed transactions from the transaction log into a pair of so-called Data and Delta Files. As with your regular SQL Server data file, the data and delta files from In-Memory OLTP are there to speed up crash recovery. Crash recovery constructs your memory optimized tables initially from the data and delta file pairs, and afterwards all changes which occurred since the last checkpoint process are applied from the transaction log. Let’s have a look at this concept.

In-Memory OLTP Crash Recovery

Because everything is about maximum performance in In-Memory OLTP, the data and delta file pairs are only written with sequential I/O by SQL Server. There is no random I/O involved, because that would kill the performance. The question is now, what information is written to the data and delta file pairs? The data file of In-Memory OLTP consists of the records that were inserted into your memory optimized table. Because inserts only happen at the end of the file, sequential I/O can be achieved very easily. When you delete a specific record in your memory optimized table, that record is marked as logically deleted in the corresponding delta file. That information is also always added at the end of the delta file, which leads again to true sequential I/O in the storage. And an UPDATE statement is just a combination of an INSERT and a DELETE operation of the new and old copy of the record. Easy isn’t it? The following picture illustrates this very important concept.

Data & Delta Files in In-Memory OLTP

Tony Rogerson (Blog, Twitter) has also written a very detailed blog posting about how durability is achieved with In-Memory OLTP. I highly recommended that reading material for a further deep dive on the topic!

Configuring the In-Memory OLTP file group

Crash recovery of memory optimized tables is performed at the speed of the storage where your data and delta file pairs are stored. For that reason it is very important to be careful, when you create your database, when considering and deciding on the configuration of the In-Memory OLTP file group that stores the file pairs. In the first step you always have to add a new file group that contains memory optimized data, as you can see in the following listing:

-- Add a new memory optimized file group
ALTER DATABASE InMemoryOLTP
ADD FILEGROUP InMemoryOLTPFileGroup CONTAINS MEMORY_OPTIMIZED_DATA
GO

After the creation of the memory optimized file group (which is under the covers a traditional FILESTREAM file group), you can add a storage container into the file group. You can use the ADD FILE command here as demonstrated in the following listing:

-- Add a new storage container
ALTER DATABASE InMemoryOLTP ADD FILE
(
	NAME = N'InMemoryOLTPContainer', 
	FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\InMemoryOLTPContainer'
)
TO FILEGROUP [InMemoryOLTPFileGroup]
GO

Almost all examples that you currently encounter in the various blog postings around the world just add 1 storage container – and that will hurt your performance tremendously! The reasons are very simple:

  • Data and Delta files are stored on the same physical drive
  • Writing to Data and Delta files on the same physical drive leads to random I/O
  • Crash recovery can be only as fast your one physical drive, where the Data and Delta files are stored

To overcome all these limitations, you can add multiple storage containers on different *physical* drives to the memory optimized file group. In that case, Data and Delta files are allocated in a round-robin fashion between the multiple storage containers. Imagine what happens when you configure 2 storage containers:

  • The 1st data file goes into the 1st storage container
  • The 1st delta file goes into the 2nd storage container
  • The 2nd data file goes into the 1st storage container
  • The 2nd delta file goes into the 2nd storage container

The following picture illustrates this concept to you in more detail.
Round Robin Allocation
But using only 2 storage containers doesn’t solve your disk bottleneck problem, because with only 2 containers, all your data files are stored in the 1st container, and all your delta files are stored in the 2nd container. Normally your data files are quite a bit than your delta files, which means your I/O will not be balanced between both storage containers on different physical drives. The 1st storage container with all of your data files will need to provide more IOPS than the 2nd one with all your delta files.

To distribute and balance the IOPS between multiple physical drives evenly, Microsoft recommends that you use at least 4 storage containers for your memory optimized file group. Imagine what happens when you use 4 storage containers, spread across 2 physical drives, as illustrated in the following picture:

Round Robin Allocation with 4 Storage Containers
If you use that setup, you have the 1st file pair (data & delta files) on the 1st physical drive, the 2nd file pair (data & delta) on the 2nd physical drive, the 3rd file pair on the 1st physical drive, and so on. You have finally evenly spread your I/O requirements across multiple physical drives, which will speed up the crash recovery process, because crash recovery can process the storage containers in parallel, which brings your database online faster.

Summary

As you have seen in this blog posting, storage speed and throughput is still very important for In-Memory OLTP. Your data must be still persisted physically in the storage, otherwise the data would be lost when your SQL Server crashes or when you perform a restart. The configuration of your memory optimized file group has a huge impact on the throughput that the file group gives you. If you run with In-Memory OLTP in a production environment, you should have at least 4 storage containers spread across 2 physical drives. In more performance demanding scenarios you can even think about adding more storage containers on more than 2 physical drives. As Microsoft says: “Crash recovery in In-Memory OLTP is performed at the speed of your storage”.

Thanks for reading!

-Klaus

Announcing the SQL Server Query Tuning Workshop

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I’m very happy and proud to announce my brand new SQL Server Query Tuning Workshop that I will run later this year at multiple locations across Europe.

The goal of this workshop is very easy: I want to learn you the handcraft how to write better performing T-SQL queries, and how you can analyze and troubleshoot query related performance problems in SQL Server:

  • Learn how to write high performance T-SQL queries
  • Understand and apply the difference between logical and physical query processing
  • Getting a detailed knowledge about execution plans and how to optimize performance
  • Applying indexing strategies to your performance problems
  • Evaluate if In-Memory technologies make sense for your database
The workshop is currently scheduled for the following locations and dates:
  • October 13 – 16 in Zurich/Switzerland
  • October 20 – 23 in London/United Kingdom
  • October 27 – 30 in Vienna/Austria
  • November 17 – 20 in Utrecht/Netherlands

If you are interested in more information about the workshop, the detailed agenda, and the pricing, please have a look at the workshop website itself.

Thanks for reading!

-Klaus

SQL Server Quickie #14 – The Tipping Point

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 14th SQL Server Quickie to YouTube. This time I’m talking about the Tipping Point in SQL Server. You can find the script used during the demonstration here as a download.

Thanks for watching!

-Klaus