The dangerous beauty of the PIVOT operator in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In today’s blog posting I want to discuss one specific T-SQL language construct in SQL Server – the PIVOT operator that was introduced back with SQL Server 2005. I always refer to this language construct as one of the most dangerous ones in SQL Server – and you will very soon see why. Before we get into some specific problems and pitfalls, I first want to give you a basic overview of what can be accomplished with pivoting in SQL Server.

Overview

The basic idea behind the PIVOT operator in SQL Server is that you can rotate rows into columns during a T-SQL query. The operator itself was introduced back with SQL Server 2005 and is mainly used for databases that are built on the principles of an Entity Attribute Value model (EAV). The idea behind an EAV model is that you can extend database entities without performing database schema changes. For that reason an EAV model stores all attributes of an entity in one table as key/value pairs. Let’s have a look at the following table that models a simple key/value pairs table.

As you can see, I have inserted 2 database entities into that table, and every entity consists of multiple attributes. Every attribute is just an additional record in that table. If you want to extend the entity with additional attributes, you need to insert additional records into that table but there is no need to perform a database schema change – that’s the “power” of an open database schema…

Querying such an EAV table is of course very hard, because you are dealing with a flat key/value pair structure. For that reason you want to rotate the table contents, and rotate rows into columns. You can perform this rotation natively with the PIVOT operator, or completely manually through a traditional CASE expression. Before we get down to the nasty details of the PIVOT operator I want to show you how you can perform the rotation by hand with T-SQL and some CASE expressions. If you perform the rotation manually, your T-SQL query has to implement 3 different phases:

  1. Grouping Phase
  2. Spreading Phase
  3. Aggregation Phase

In the Grouping Phase we compact our EAV table into distinct database entities. In our case we perform a GROUP BY on the column RecordID. In the 2nd phase, the Spreading Phase, we use multiple CASE expressions to rotate rows into columns. And finally we use the MAX expression in the Aggregation Phase to return one distinct value for every row and column. Let’s have a look at the following T-SQL code.

As you can see from the code, it is very easy to identify each phase, and how they are mapped to the T-SQL query. The following picture shows you the result of the query, where we have finally rotated rows into columns.

Manually pivoted data

The PIVOT operator

Beginning with SQL Server 2005 (almost 10 years ago!), Microsoft introduced the PIVOT operator in T-SQL. With that operator you can perform the same transformation (rows to columns) with just one native operator. Sounds very easy and promising, doesn’t it? The following listing shows how you can perform the same transformation natively with the PIVOT operator.

When you run that query, you will get back the same result as seen in the previous picture. But when you look at the syntax of the PIVOT operator, you will see one big difference compared to the manual approach:

You can only specify the spreading and aggregation elements! There is no way to define the grouping elements explicitly!

The grouping elements are just the remaining columns that you haven’t referenced in the PIVOT operator. In our case we haven’t referenced the column RecordID in the PIVOT operator, therefore this column is used during the Grouping Phase. This leads to interesting side effects if we subsequently change the database schema, for example by adding additional columns to the underlying base table:

When you now run the same query with the PIVOT operator again (make sure you have non NULL values in that column), you will get back a completely different result, because the Grouping Phase is now done on the columns RecordID and SomeData (which we have just added).

The native PIVOT operator can lead to a wrong result

Contrast that with what happens if we re-execute the manually T-SQL query that we wrote to start with. It still returns the same correct result. That’s one of the biggest negative side effects of the PIVOT operator in SQL Server: the grouping elements can’t be defined explicitly. To overcome this problem, it is a best practice to use a table expression and only return the necessary columns from the table. With this approach you will also have no problems if you change the table schema at a later stage, because the additional columns are not returned (by default) from the table expression. Let’s have a look at the following listing.

As you can see from this code, I’m feeding the PIVOT operator through a table expression. And within the table expression you are only selecting the needed columns from the base table. This then means that you can change the table schema in the future without breaking the result of the PIVOT query.

Summary

I hope that this blog posting has shown you why the PIVOT operator can be very dangerous in SQL Server. The syntax itself leads to very efficient code, but as a side effect you can’t specify the grouping elements directly. Therefore you should always make sure to use a table expression to define exactly which columns are fed into the PIVOT operator to make the given result deterministic.

What is your experience with the PIVOT operator? Do you like it or not? If you don’t like it, what would you want to have changed?

Like or share to get the source code.

Thanks for reading!

-Klaus

14 Comments

  • Brett Gerhardi

    Does the pivot operator actually result in a more efficient execution plan?

    • Hello Brett,

      Thanks for your comment.
      I just checked both execution plans, and they are almost the same: Clustered Index Scan + Stream Aggregate operator.
      With the manual T-SQL solution, the Query Optimizer also added a Compute Scalar operator between the Clustered Index Scan and the Stream Aggregate operator.
      The Compute Scalar operator is used to perform the calculation of the CASE expressions.

      Best regards,

      -Klaus

  • After adding the extra column, I get the same result with the original Pivot query. (2014 Express)

    • Hello Filip,

      Thanks for your comment.
      Do you have also updated the newly added column with some data?
      When you have NULL values in the column, then the original PIVOT query gets you the same result.

      Just have a look at the enclosed source code.

      Best regards,

      -Klaus

  • Now add a column with Age (datatype tinyint) and try to use PIVOT 🙂

  • After adding the tinyint column, add that to the PIVOT and see what happens. Then add the column to the “old” solution with MAX(CASE…).

  • Julius

    Hi Klaus,

    I added the SomeData column, and put in ‘C’ for Record 1 and ‘D’ for Record 2, and when I run the first PIVOT query, I still get 2 rows.

  • Manu Mohan

    is there any methods to use PIVOT without any aggregate functions??

    • Klaus Aschenbrenner

      Hello Manu,

      I don’t think so.

      Thanks,

      -Klaus

  • Nice article 🙂

  • Nice article, Klaus, but this is not so dangerous as it may look like. Usually, you are not using the table directly in the PIVOT clause. Usually, a derived table used there. Roughly 100% percent of people will follow the syntax and example of PIVOT from MSDN – https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx or from Craig Freedman’s blog – https://blogs.msdn.microsoft.com/craigfr/2007/07/03/the-pivot-operator/, so this situation will not happen to them.

  • Dharmik

    Very Helpful..

  • Alejandro

    Great articule! Thanks!

It`s your turn

Your email address will not be published. Required fields are marked *

SQLpassion

Copyright © 2015 by SQLpassion · Klaus Aschenbrenner · Imprint · Offerings · Academy · Contact · Go to Top