The power of Grouping Sets in SQL Server

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Have you ever wanted to perform aggregations across multiple columns/dimensions in SQL Server without licensing SSAS (SQL Server Analysis Services)? I am not talking about using the Developer edition in production, nor am I talking about installing a pirate copy of SQL Server.

Mission Impossible? Not really, because it’s possible through the so-called Grouping Sets in SQL Server. I want to give you in this blog posting an overview about grouping sets, what kind of queries you can achieve with them, and what are their performance advantages.

Aggregations with Grouping Sets

Imagine you have an orders table, and you want to perform T-SQL query aggregations across multiple groups. In the context of the Sales.SalesOrderHeader table of the AdventureWorks2012 database, these groupings can be something like the following:

  • A grouping across “everything”
  • GROUP BY SalesPersonID, YEAR(OrderDate)
  • GROUP BY CustomerID, YEAR(OrderDate)
  • GROUP BY CustomerID, SalesPersonID, YEAR(OrderDate)

When you want to perform these individual groupings with a traditional T-SQL query, you need multiple statements, where you perform a UNION ALL between the individual result sets. Let’s have a look at such a query:

The approach used by this T-SQL statement has multiple disadvantages:

  • The T-SQL statement itself is huge, because every individual group is one distinct query.
  • The table Sales.SalesOrderHeader has to be accessed 4 times – once for every distinct query.
  • When you look at the execution plan, you can see that SQL Server performs an Index Seek (NonClustered) operation 4 times – once for every query.

You can dramatically simplify the T-SQL code that you need, if you use the grouping sets functionality introduced back in SQL Server 2008. The following code shows you the same query, but this time implemented with grouping sets.

As you can see from the code itself, you just specify the needed grouping sets inside the GROUP BY GROUPING SETS clause – everything else is performed transparently by SQL Server. The empty parentheses specify the so-called Empty Grouping Set, the aggregation across the whole table. When you also look at the output of STATISTICS IO, you can see that the table Sales.SalesOrderHeader was accessed only once! That’s a huge difference from the previous manual implementation that we have performed.

Within the execution plan SQL Server uses a Table Spool operator that stores the retrieved data temporarily in TempDb. The data from the work table created in TempDb is afterwards used in the second branch of the execution plan. Therefore the data isn’t rescanned for every group from the base table, which leads to a better performance of the whole execution plan.

Query plan that calculates Grouping Sets

When you look at the execution plan, you can also see that the query plan contains 3 Stream Aggregate operators (highlighted in red, blue, and green). These 3 operators are calculating the individual grouping sets:

  • The blue highlighted operator calculates the grouping set for CustomerID, SalesPersonID, and YEAR(OrderDate).
  • The red highlighted operator calculates the grouping set for SalesPersonID and YEAR(OrderDate). In addition it also calculates the grouping set across “everything”.
  • The green highlighted operator calculates the grouping set for CustomerID and YEAR(OrderDate).

The idea behind the 2 subsequent Stream Aggregate operators is to calculate so-called Super Aggregates – aggregations of aggregations.

Summary

In today’s blog posting I have introduced Grouping Sets to you, a T-SQL enhancement introduced back in SQL Server 2008. As you have seen grouping sets have 2 huge advantages: they simplify your code, and they improve the performance of the query by accessing the base data only once.

I hope that you have now got some good ideas about grouping sets, and it would be really interesting for me, if you can leave a short comment if you are able to use this functionality in your own databases.

Like or share this blog posting to get the source code.

Thanks for your time!

-Klaus

3 Comments

  • Good and quick introduction to GROUPING SET

  • Amey Deshpande

    Very nicely explained , Now the grouping set concept is much clear to me

  • Flash Ash

    Oldie but a goodie, very nice explanation, thanks!

    Another way of retrieving the same results I often see (and until just now, have been using) is the use of a CUBE() or ROLLUP() with various checks against Grouping_ID() or GROUPING() in the HAVING clause, for instance the following (ugly) code I’ve just been trying to fix:


    Group by ROLLUP(ColA, ColB, ColC)
    Having NOT (GROUPING(ColA) = 1
    AND GROUPING(ColB) = 1
    AND GROUPING(ColC) = 1)

    I was simply going to replace the Groupings in the having clause with a single Grouping_ID(ColA, ColB, ColC) != 7, but correct use of Grouping sets is miles better!

It`s your turn

Your email address will not be published. Required fields are marked *

SQLpassion

Copyright © 2015 by SQLpassion · Klaus Aschenbrenner · Imprint · Offerings · Academy · Contact · Go to Top