Recompilations, Part 2

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In my last week’s blog posting I talked about how the use of temp tables can cause recompilations in SQL Server. As mentioned at the very end of the blog posting, I want to concentrate today a little bit more on Table Variables, which can avoid the expensive overhead of recompilations. Let’s have a more detailed look at them.

Table Variables

A table variable is always scoped to a batch submitted to SQL Server. When you want to reference a table variable outside of a batch, you will get back an error message. That’s the first big difference compared to temp tables. The following code shows how you can create and use table variables – again in the context of a simple stored procedure.

The nice thing about table variables is that they don’t cause any recompilations. When you execute the stored procedure and when you monitor recompilations with the Extended Event session from last week, you will see that no recompilations are happening.

Why is this possible with table variables? First of all a table variable is just a variable – as the name implies. This means that you don’t change your database schema when you define your table variable. Therefore recompilations based on database schema changes can be avoided. And in addition a table variable has no statistics. Therefore statistics don’t have to be maintained, and the second cause of recompilations just disappears.

Both things sounds great at first, but we will find huge drawbacks when we look further. Let’s have a look at them. A table variable is just a variable – almost! A table variable is still persisted in TempDb! Yes, you read correctly: when you work with a table variable, physical I/O in TempDb is involved. This is very easy to prove with the Dynamic Management View sys.dm_db_session_space_usage, which tracks TempDb usage on a session level. Let’s have a look at the following code:

As you can see from the output from the DMV, the table variable needs 5 pages in TempDb, which were allocated. Because the table variable is already out of scope, these 5 pages are already marked for deallocation. Be aware of that side-effect!

Table variables also have no statistics. Therefore no recompilations happen here. But as a side-effect the Query Optimizer always just estimates 1 row. And that can be very, very bad. Imagine you make a join from your table variable to another table in your database. In that case the Query Optimizer will introduce the Nested Loop Join operator in the execution plan, and will reference the table variable as the outer table, because of the estimation of 1 row. If you return in reality 10000 or even more rows, the whole execution plan isn’t really optimal anymore. Let’s have a look at the following example:

When you have a more detailed look at the Clustered Index Scan operator, you can see that the cardinality estimation is here just 1 row and is nowhere near the actual number of rows returned by the operator.

Wrong Cardinality Estimation

You can fix that wrong cardinality estimation by using a Statement-Level Recompilation that was introduced back with SQL Server 2005.

But this approach is a little bit counter-productive, because you again introduce a recompilation overhead that you wanted to avoid with the use of table variables…

Summary

With table variables you can avoid the overhead of recompilations in SQL Server – with some side-effects. The largest one is the wrong cardinality estimation of just 1 row. Therefore table variables only make sense when you deal with a small amount of rows, because then the wrong cardinality estimation doesn’t really matter and doesn’t affect your performance. But with a larger set of rows it would hurt your performance because of the inefficient execution plan that is generated.

As a general rule-of-thumb you should use temp tables for a large amount of data, and table variables with a small amount of data. But you really have to benchmark it for your own workload, to decide when a temp table and when a table variable makes sense.

Like or share this blog posting to get the source code.

Thanks for your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Recompilations, Part 1

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In this and next week I want to talk about a very important performance tuning topic in SQL Server: Recompilations. As you will see they can happen when you execute very simple stored procedures. Today I want to lay out the foundation of recompilations in SQL Server, why they have to happen, and next week I will show you how you can avoid them by rewriting your stored procedures in a different way.

Recompilations – what?

Before I want to talk about the specifics of recompilations in SQL Server, I want you to have a more detailed look at the following simple stored procedure.

As you can see from the code, there is nothing special about it. In the first step a temp table is created. Afterwards some records are inserted into the Temp Table, and finally the inserted data is retrieved with a simple SELECT statement. Code that you have already written hundreds – maybe thousand times in SQL Server. Let’s create in the first step an Extended Event session that tracks recompilations. The following code shows the definition of this event session.

When you now execute the stored procedure you will see that that 2 Recompilations have happened during the runtime.

2 recompilations for a simple stored procedure

A Recompilation means that SQL Server recompiled the execution plan during the execution itself. This introduces additional CPU overhead that will finally decrease the throughput of your workload. But for now the real question is why these recompilations have happened?

SQL Server mainly performs recompilations under the following 2 circumstances:

  • Schema Changes
  • Statistic Updates

And because of these 2 circumstances we have triggered the recompilations in the previous executed stored procedure. When you look again at the stored procedure definition, you can see that we have created in the first step a temp table. When you create a temp table in TempDb, you change your database schema, and because of this reason the first recompilation has happened.

And after the creation of the temp table, you have inserted some rows into it. Therefore SQL Server had to update the underlying statistics object of the Clustered Index of the temp table – the Clustered Index was created through the PRIMARY KEY constraint by SQL Server. A simple stored procedure caused 2 recompilations in SQL Server – wow!

Summary

Temp tables can have a huge performance overhead caused by recompilations. On the other hand temp tables have accurate statistics that helps the Query Optimizer to produce better performing execution plans. Therefore temp tables only make sense when you deal with a larger set of data. For a smaller set of data the CPU overhead caused by the recompilations would be larger as the performance gain through the statistics.

In next weeks blog posting I will introduce Table Variables to you, and here we will see how they can avoid the nasty side-effects of recompilations – and what other performance problems they can introduce. Stay tuned.

Like or share this blog posting to get the source code.

Thanks for your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

How to feel the SQLpassion in your veins

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

The last month was really time consuming for me: I was running my brand-new SQL Server Query Tuning Workshop 3 times across Europe: Zurich/Switzerland, London/United Kingdom, and finally in Vienna/Austria. The workshop runs for 4 days, and I teach almost everything you need to know to write better and faster performing SQL queries for SQL Server.

I don’t want to talk too much about the workshop itself, because Arun, Ina, Bernd, and Thomas were kindly enough to provide some feedback about the workshop from their perspective as an attendee. So let’s sit back, and enjoy listening to them.

Arun from London/United Kingdom

“My day-to-day writing of SQL has gone up a level – I am able to finely tune individual statements or underlying database objects to get the kind of query plan I want. Moreover I know now what sort of plan I am looking for. I have been already able to use this knowledge to address performance issues that came up this week in capacity testing.”

Arun feels now SQLpassion in his veins

Ina from Oberhaching/Germany

“Anyone faced with the task to implement queries on complex ERP data structures – for reporting, providing views, or for statistical analysis – comes quickly to its limits without knowledge of the operation of SQL Server. The workshop provides an excellent insight into the technical world of SQL Server, explained with many examples of queries and their optimization.”

Ina feels now SQLpassion in her veins

Bernd from Monheim/Germany

“After attending the 4 days long workshop you know very exactly how SQL Server executed SQL queries on the logical and physical level through execution plans. In addition you will know how you can improve queries by applying the right indexing strategy. This in-deep knowledge was very well explained and presented by Klaus.” 

Bernd feels now SQLpassion in his veins 

Thomas from St. Margrethen/Switzerland

“I was lucky enough to attend the 4 day long SQL Server Query Tuning Workshop from Klaus in Zurich. He has presented in a very easy way the performance related internals of SQL Server that will help you to write better performing SQL queries. Based on that knowledge I finally know now why sometimes queries are fast, and sometimes are slow. It was an amazing workshop, which makes the day-to-day work with SQL Server now very easy.”

Thomas feels now SQLpassion in his veins

Wow, that’s really amazing! It makes me really proud to see the SQLpassion in their faces, and how the workshop already helped them in their day-to-day life with SQL Server. If you also want to feel the SQLpassion in your veins when working with SQL Server, you can signup for the last SQL Server Query Tuning Workshop in 2014 in Utrecht/Netherlands from November 17 – 21.

If you attend this week the SQLPASS Summit in Seattle, make sure to say hi to me. I try to be as much time as possible at the Community Zone.

Nice greetings from the always rainy Seattle ;-)

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #19 – The Halloween Problem

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 19th SQL Server Quickie to YouTube. This time I’m talking about the Halloween Problem in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Parameter Sniffing, Part 2

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In last week’s blog posting I introduced the basic concept and the problems behind Parameter Sniffing in SQL Server. As you have seen, it can lead to serious performance problems when a cached plan is blindly reused by SQL Server. Today I want to show you how you can deal with this problem, and how to overcome it using a number of different techniques.

Indexing

The underlying root cause of the Parameter Sniffing problem the we discussed last week is the fact that in some cases the SQL statement produced a Bookmark Lookup and in some other cases produced a Table/Clustered Index Scan operator in the execution plan. If you are able to change the indexes in your database, the easiest way is to provide a Covering Non-Clustered Index for this specific query. In our case you have to include the additional requested columns from the Bookmark Lookup in the leaf level of the Non-Clustered Index. If you do that, you have also achieved so-called Plan Stability: regardless of the provided input parameters, the Query Optimizer always compiles the same execution plan – in our case an Index Seek (Non Clustered) operator.

A Covering Non-Clustered Index is accessed

If you don’t have a chance to work on your indexing strategy (maybe you deal with a 3rd party application, where you are not allowed to make indexing changes), you can work with a number of “transparent” SQL Server options that I will describe in the following sections.

Recompilations

The first option that SQL Server offers you is a recompilation of the execution plan. SQL Server provides 2 different options for you to use:

  • A recompilation of the whole, complete stored procedure
  • A recompilation of the problematic SQL statement – a so called Statement Level Recompilation (available since SQL Server 2005)

Let’s have a more detailed look at both options. The following code shows how you can apply a recompilation of the whole stored procedure with the RECOMPILE option.

When you run such a stored procedure, the Query Optimizer always recompiles the stored procedure at the beginning of the execution. Therefore you always get an execution plan which is optimized for the currently provided input parameter values. As a side-effect the execution plan no longer gets cached, because it doesn’t make sense to cache a query plan which is recompiled every time. When you have a large, complicated stored procedure a RECOMPILE query hint at the stored procedure level doesn’t always make sense, because your whole stored procedure is recompiled.

Maybe you have a Parameter Sniffing problem in just one specific SQL statement. In that case the overhead for the recompilation of the whole stored procedure would be too much. For that reason SQL Server, since version 2005, offers a so-called Statement Level Recompilation. You are able to mark a specific SQL statement for recompilation instead of the complete stored procedure. Let’s have a look at the following code.

In that example the second SQL statement is recompiled every time that the stored procedure is executed. The first statement is compiled during the initial execution, and the generated plan is cached for further reuse. That’s the preferred way to deal with Parameter Sniffing when you have no influence on the indexing strategy of your database.

OPTIMIZE FOR

In addition to the recompilation of a stored procedure or the SQL statement, SQL Server also offers you the OPTIMIZE FOR query hint. With that query hint you are able to tell the Query Optimizer for which specific parameter values the generated query plan should be optimized. Let’s have a more detailed look at the following example.

As you can see from the stored procedure definition, the execution plan of the SQL statement is always optimized for an input parameter value of 1 for the parameter @Col2Value. Regardless of which input value you provide for this parameter, you will always get a plan compiled for the value of 1. With this approach you are already working with a sledgehammer against SQL Server, because the Query Optimizer no longer has any choice – it must always produce a plan optimized for the parameter value of 1. You can implement this query hint when you know that a query plan optimized for a specific parameter value should almost always be generated. You will be able to predict your query plans when you restart SQL Server or when you perform a cluster failover.

If you are going to go down this route, you really have to know your data distribution, and you also need to know when your data distribution changes. If the data distribution changes, you also have to review your query hint to see if it’s still appropriate. You can’t rely on the Query Optimizer, because you just have overruled the Query Optimizer with the OPTIMIZE FOR query hint. You must always keep this in mind! In addition to the OPTIMIZE FOR query hint, SQL Server also offers the OPTIMIZE FOR UNKNOWN query hint. If you decide to use that query hint, the Query Optimizer uses the Density Vector of the underlying statistics object to derive the cardinality. The plan that the Query Optimizer generates for you depends on the data distribution. If your logical reads are over the Tipping Point, you end up with a Table/Clustered Index Scan…

Summary

In this blog posting I have shown you multiple ways to deal with the Parameter Sniffing problem in SQL Server. One of the most common root causes of this specific problem is a bad indexing strategy, where with a selective parameter value the Query Optimizer introduces a Bookmark Lookup into the execution plan. If such a plan gets reused in the future, your I/O costs will explode. I’ve already seen execution plans in production, which generated more than 100 GB of logical reads, just because of this problem. A simple RECOMPILE query hint on the statement level fixed the problem, and the query produced just a few logical reads.

If you can’t influence the indexing strategy of the database, you can work with the RECOMPILE query hint on the stored procedure or SQL statement level. As a side effect the recompiled plan will no longer be cached. In addition to these query hints, SQL Server also offers you the OPTIMIZE FOR and OPTIMIZE FOR UNKNOWN query hints. If you work with these query hints, you really have to know your data and your data distribution, because you are overruling the Query Optimizer. Be always aware of this fact!

Like or share this blog posting to get the source code.

Thanks for your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Parameter Sniffing, Part 1

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to talk about one major performance problem that I see very often during SQL Server consulting engagements – the so-called Parameter Sniffing problem. This problem occurs when you work with parameterized SQL statements like stored procedures in combination with the plan caching mechanism of SQL Server. In the first part of this blog posting I will introduce the scope of this problem to you, and in the second part next week I will show you several techniques for solving it.

Parameter Sniffing – what?

When you execute a parameterized SQL query in SQL Server, the Query Optimizer compiles an execution plan based on the first provided parameter values. Afterwards the generated query plan is cached in the plan cache for further reuse. This means that SQL Server will reuse the plan afterwards without regard for the parameter values that you provide at the time. We need to recognize 2 kinds of parameter values:

  • Compile time values
  • Runtime values

The Compile time parameter values are the values that are used by the Query Optimizer to generate a physical execution plan. And the Runtime values are the values that you provide for the execution of the query plan. For the first execution these values are identical, but for subsequent executions, these values will probably be different. This may introduce serious performance problems, because your execution plan is always optimized for the Compile time values, and not for the various Runtime values that you subsequently provide.

Imagine that during the first query execution you provide a very selective value, and the Query Optimizer chooses a Non-Clustered Index Seek operation followed by a Bookmark Lookup operation to retrieve additional requested columns from your base table. Such an execution plan only makes sense with a selective value, because with a non-selective value your logical reads would be too high, and SQL Server would just scan the whole table and discard non-qualifying rows through a residual predicate in the Scan operator. The point where SQL Server decides between the two plan shapes is the so-called Tipping Point.

Imagine now what happens if a query plan with a Bookmark Lookup gets cached, and afterwards gets blindly reused with a non-selective value. In that case SQL Server doesn’t implement any protection mechanism and will just execute the found query plan from the plan cache. As a side-effect your I/O costs (your logical reads) will just explode, and the query will end up with very bad performance. To show you a concrete example of this behavior, the following code creates a simple table with an uneven data distribution in the second column.

When you have a look at the table, you can see that the value 1 occurs only once, and that the value 2 occurs 1499 times. This is a simple example of an uneven data distribution but you will often see this in your own tables: you have a lot of customers in your own country, and just a few other customers in other countries. Based on that uneven data distribution, a Non-Clustered Index on this column, and the Tipping Point, you get 2 different execution plans for the same logical query:

2 different Execution Plans

When you now create a stored procedure, the Query Optimizer will generate the execution plan based on the first provided parameter values, and afterwards the query plan will be blindly reused by subsequent executions.

When you now execute the stored procedure with the parameter value of 1, where you just return one record, the Query Optimizer will choose a Bookmark Lookup in the execution plan. The query itself just produces 3 logical reads. But when you next execute the stored procedure– this time with a parameter value of 2 – the cached plan gets reused and the Bookmark Lookup is done 1499 times – over and over again. For every returned record! The query now results in1505 logical reads. That’s a huge difference from the previous execution. When you look at the properties window of the SELECT operator you can also see that we have a parameter list. For every parameter SQL Server shows the Compile and Runtime values.

Different Compile-Time and Runtime Values

And as you can see they are different. The Compile time value is 1, and the Runtime value is 2. This means that the execution in front of you was optimized for an input parameter value of 1, but actually you executed the stored procedure with a parameter value of 2. That’s the Parameter Sniffing problem in SQL Server.

Continue reading on part 2.

Summary

As you have seen it is very easy in SQL Server to hit that specific problem. Every time that you use parameterized SQL queries (such as in stored procedures) you have to deal with this side-effect when you have an uneven data distribution in your table, and your provided Non-Clustered Index is not a covering one. Now that I have introduced you to the problem, I will show you next week how you can deal with this specific problem, and which solutions SQL Server provides for you.

Like or share this blog posting to get the source code.

Thanks for your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

My SQLBits Attendee feedback

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

A few weeks ago, I got my SQLBits attendee feedback, and therefore I want to talk about it today. As you might know I have presented 2 sessions in Telford: one about Latches, Spinlocks, and Lock Free Data Structures, and another one about the Dangerous Beauty of Bookmark Lookups.

Latches, Spinlocks, and Lock Free Data Structures

Let’s have a look at the first session about Latches, Spinlocks, and Lock Free Data Structures. This will be – BTW – the same session that I will present at the SQLPASS Summit in Seattle in a few weeks. In this session I start by talking about thread synchronization mechanisms, and how SQL Server implements these mechanisms with Latches. To give you a more concrete idea about Latches, I also talk about the Last Page Insert Latch Contention that can occur on a Clustered Index. The following picture shows the feedback score of this session.

Session Feedback

Not that bad ☺. I delivered this session at SQLBits for the first time in public, so I’m really happy that it worked out very well, and that attendees liked it. It’s always a gamble whether people like a session, or not. I also want to share a few feedback comments from my attendees:

“For the sessions I attended, this was the stand out one. Delivered by Klaus who is a master at speaking/training. Good pace, subtle humour thoroughly loved the session.“

“Great really detailed session from Klaus. I was surprised it was a little bit quiet – I’m sure Klaus packed out a room double the size in Liverpool a few years ago / possibly too specialist for most, but I enjoyed it.“

I also thought the same about my session, when I was finished. I had quite a large room, but the room was only half full. It’s quite an in-depth session (around level 400 – 500), maybe this has scared away some people from attending it. But trust me: I don’t bite! Let’s see how this works in Seattle, because there I also have a large room for my session.

The Dangerous Beauty of Bookmark Lookups

I have lost count of how often I have already delivered this session. I know this session inside out, and I could almost deliver it if you woke me up in the middle of the night. In the first step of this session I will always give an introduction to Bookmark Lookups, why they are needed in SQL Server, and how you can resolve them by providing a Covering Non-Clustered Index for your problematic query. And based on that foundation I go afterwards into the nasty details of Bookmark Lookups, and show several scenarios where they can lead to serious performance problems. I talk here about the so-called Tipping Point in SQL Server, for quite a long time about Parameter Sniffing, and finally about Bookmark Lookup Deadlocks, which always scares people. The following picture shows again the feedback score of this session.

Session Feedback

It’s still quite a good result, but not as good as the previous session. Based on the feedback comments I have seen that some people thought that the session is DBA centric, but this isn’t the case. It’s a developer oriented session. Based on that great input I will rework my session abstract a little bit, so that people immediately know what they wil be getting when they attend this session. Here are some feedback comments:

”Great session, but not technical enough for DBAs; I guess this was more aimed at Developers.”

”Klaus you are a legend!!!! I loved this session!!!! The best of SQLBits!!!!”

The last comment really makes me feeling proud ☺. I always like it when I can share my passion around SQL Server with my attendees.

Thanks for your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

SQL Server Quickie #18 – Bookmark Lookups

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have uploaded the 18th SQL Server Quickie to YouTube. This time I’m talking about Bookmark Lookups in SQL Server.

Like or share to get the source code.

Thanks for watching!

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Lock-Free Data Structures

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

A few weeks ago, I blogged about Latches and Spinlocks in SQL Server. Both synchronization primitives are used to protect shared data structures within SQL Server, like pages in the Buffer Pool (through Latches) or locks in the Lock Manager’s hashtable (through a Spinlock). A completely new synchronization paradigm that you will see more and more is so-called Lock-Free Data Structures. That’s also one of the foundations on which In-Memory OLTP in SQL Server 2014 is built. Therefore I want to give you in today’s blog posting a quick overview of and a general introduction to Lock-Free Data Structures and what they have to offer.

What are Lock-Free Data Structures?

A lock free algorithm protects a shared data structure through a non-blocking algorithm. As you have already seen in the previous blog postings about Latches and Spinlocks, other threads are blocked when they can’t acquire a Latch or a Spinlock. When a thread waits for a Latch, SQL Server puts the thread into the SUSPENDED state, if a thread waits for a Spinlock, the thread has to spin actively on the CPU. Both approaches lead to a blocking scenario, which we want to avoid through a Non-Blocking Algorithm. Wikipedia has a really nice explanation of Non-Blocking Algorithms:

“A non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress regardless of scheduling.”

The most important point to pick up from this explanation is that a thread will not be blocked by another thread. This is possible, because no traditional locks are used for the thread synchronization itself. Let’s have a more detailed look at a concrete example. 

Spinlock Implementation

Let’s go through this code step-by-step. First of all, the implementation of the function compare_and_swap is implemented through one atomic hardware instruction directly on the CPU level – CMPXCHG. I just wanted to illustrate what logic is implemented in CMPXCHG: you compare a value to an expected value, and if they are the same, the old value is set to a new value. Because the whole logic of CMPXCHG is implemented as one atomic unit on the CPU itself, no other thread can interrupt the execution of this assembly opcode.

To store the state of the spinlock itself, a variable with the name lock is used. Therefore a thread has to spin in the while loop until the spinlock synchronization variable is unlocked. If this happens the thread can lock the synchronization variable, and can finally enter the critical section in a thread-safe manner. This is again just a simplified (not thread-safe!) illustration of a Spinlock – things are a little bit harder and more complicated in reality.

The largest problem with this traditional approach is that there is a shared resource involved in the thread synchronization: the spinlock synchronization variable lock. If one thread holds the spinlock, and gets suspended, all the other threads get stuck in the while loop when they try to acquire the spinlock. You can avoid this problem by introducing a lock free coding technique.

A Lock-Free Approach

As you can see, the implementation of the method Foo has completely changed. Instead of trying to acquire a spinlock, the implementation just checks if some other thread has modified the shared variable (earlier protected through a Spinlock), before the atomic addition is performed. This means that there is no shared resource used anymore, and threads are not blocking each other anymore. That’s the main idea of Lock-Free Data Structures and Non-Blocking Algorithms.

In-Memory OLTP in SQL Server 2014 uses the same approach to install page changes in the mapping table of the Bw-Tree. Therefore there is no locking, latching, and spinlocking involved. If In-Memory OLTP “sees” that the address of a page in the mapping table has changed, it means that another thread has started a modification on that page – but hasn’t completed it yet (some other thread was scheduled in the meantime on the CPU). The individual threads in In-Memory OLTP are working together in a cooperative fashion. Therefore it is possible that the thread that has seen the modification in the mapping table just completes the “pending” operation – like a page split.

A page split in In-Memory OLTP consists of multiple atomic operations. Therefore one thread can begin a page split, and another thread can finally finish this page split. In a future blog posting I will also talk a little bit more about these types of page splits, and which changes are implemented in the Bw-Tree to make this sophisticated approach possible.

Summary

In today’s blog posting I have introduced the main idea behind Lock-Free Data Structures to you. The main idea is to check if other threads have already done an operation before itself trying to execute an atomic operation. Therefore there is no longer any need to protect a critical section through a synchronization primitive like a spinlock. And the idea of Lock-Free Data Structures and Non-Blocking Algorithms is also used by In-Memory OLTP that was first introduced with SQL Server 2014.

Thanks for reading and your time,

-Klaus

Do you like this post? Why not share it with the SQL Server community?

Grouping Sets: the CUBE and ROLLUP subclauses

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In my last blog posting I talked about the Grouping Set functionality in SQL Server. As you have seen with the examples from last week, it is very easy to perform individual groupings by simply defining the necessary grouping sets. But what if you want to have all possible groups from a given set of columns – the so-called Power Set?

Of course, you can generate the power set manually with the syntax of the grouping set functionality, but that’s still a lot of code to write. Therefore I want to show you today 2 subclauses that the grouping set functionality supports: the CUBE and ROLLUP subclauses.

CUBE Subclause

With the CUBE subclause you can generate all possible grouping sets for a given set of columns. This is the so-called Power Set. When you have the 3 columns a, b, and c, CUBE (a, b, c) produces the following groupings for you:

  • (a, b, c)
  • (a, b)
  • (a, c)
  • (b, c)
  • (a)
  • (b)
  • (c)
  • ()

The following query generates the power set for the columns CustomerID, SalesPersonID, and YEAR(OrderDate) manually through the grouping sets functionality introduced last week. 

As you can see from the code, you have to specify every possible combination. So it’s again a hard, cumbersome job to write such a query with a simple requirement. Things get much easier if you use the CUBE subclause instead of specifying individual grouping sets. Let’s have a look at the following code.

As you can see from the code, you only specify the columns, and SQL Server itself generates the power set out of it. That’s very simple, straightforward code compared with the first listing.

ROLLUP Subclause

Besides the CUBE subclause, SQL Server supports also the ROLLUP subclause since SQL Server 2008. With the ROLLUP subclause you are able to define a subset of the power set. The ROLLUP subclause also assumes a hierarchy between the individual columns. Imagine that we have again our 3 columns a, b, and c. When you use ROLLUP(a, b, c), it produces the following grouping sets:

  • (a, b, c)
  • (a, b)
  • (a)
  • ()

You can very easily see from these individual grouping sets that there is a hierarchy between these columns. Just substitute the columns a, b, and c with columns like OrderYear, OrderMonth, and OrderDate, and you will get the idea of the kind of analytical queries you can perform here. It’s the poor-man implementation of SSAS (SQL Server Analysis Services). Let’s have a look at the following query that uses the ROLLUP subclause.

The output of that query gives you the following individual grouping sets:

  • (OrderYear, OrderMonth, OrderDay)
  • (OrderYear, OrderMonth)
  • (OrderYear)
  • ()

The ROLLUP subclause has a very simple syntax, but the output that you are getting back is really powerful for data analysis!

Summary

I hope that you have enjoyed today’s blog posting about the CUBE and ROLLUP subclause introduced with the grouping sets functionality in SQL Server 2008. If you have time you can also leave a short comment, and tell me if you already use these subclauses in your databases, or if you think that they make sense in your environment.

Like or share this blog posting to get the source code.

Thanks for reading and your time!

-Klaus

Do you like this post? Why not share it with the SQL Server community?