Clustered ColumnStore Indexes – Space Savings
Over the last weeks I have done a lot of work with Clustered ColumnStore Indexes in SQL Server 2014/2016, and it is really amazing what you can achieve with this technology. In today’s blog post I want to talk a little bit more about the space savings that you can make by using a Clustered ColumnStore Index.
Clustered ColumnStore Indexes
The ColumnStore Index was first introduced back with SQL Server 2012 – but only as a Non-Clustered Index that made your table read-only. I had a lot of customers who were really amazed about this new index, but as soon as I have showed them the limitations they changed their mind.
But in SQL Server 2014 Microsoft provides you with a Clustered ColumnStore Index that is also writeable – or at least it seems that way. But the great thing about it is the fact that it is a *Clustered* ColumnStore index. Therefore you are able to store your data primarily in the highly efficient ColumnStore format, and you don’t need to store the data additionally in the RowStore format. This is a huge advantage over the Non-Clustered ColumnStore Index that was introduced in SQL Server 2012.
But how much space saving can you make by using a Clustered ColumnStore Index? Until now I have always worked with the ContosoRetailDW database from Microsoft in my demonstrations, but this example database is a little bit – let’s say – “small”. I wanted to test the Clustered ColumnStore Index with a realistic, large data set to see how the ColumnStore Index performs and how much space you can save with it.
I therefore decided to use the data set from the TPC-H benchmark. I’m a big fan of the software Benchmark Factory from Dell (which I always use as a load generator in my SQL Server workshops). With this software it is quite easy to run a TPC-E (OLTP) or TPC-H (DWH) workload, and also generate the initial data set for it. And you can use a scaling factor to define how large your database will be.
To make things easy I have started up an Amazon EC2 instance (r3.8xlarge), which gives me 32 Cores, 244 GB RAM, and 2×320 GB SSD drives – enough hardware for my test case. The cloud really rocks here! (I am not being sarcastic! Really!) After I have installed SQL Server 2016 CTP 3.2 I generated the TPC-H database with a scaling factor of 250. This gave me a initial database size of around 403 GB. And trust me, it took some time to generate that database!
Benchmark Factory generates all the tables as Heap Tables – without any traditional Clustered Index. So 403 GB is our baseline. The largest table – H_LineItem – consists of around 1 500 000 000 rows. How much space do you think we can gain when we create a Clustered ColumnStore Index on each table? Let’s try it. (Side note: I am not worrying about Segment Elimination here, so I just created the Clustered ColumnStore Index from the Heap Tables without any predefined sorting order).
CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_Customer GO CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_LineItem GO CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_Nation GO CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_Order GO CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_Part GO CREATE CLUSTERED COLUMNSTORE INDEX cssi ON H_PartSupp GO CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON H_Region GO CREATE CLUSTERED COLUMNSTORE INDEX cssi ON H_Supplier GO
After the creation of all Clustered ColumnStore Indexes I have check the size of the tables again: the complete tables needed about 60 GB of space altogether! Imagine that: I was able to get a space saving of 343 GB just by creating Clustered ColumnStore Indexes! This is really amazing!
But we are not finished yet: Since SQL Server 2014 you can also enable a so-called Archival Compression, where SQL Server will use the LZ77 algorithm when you write the LOB data pages (that contains the ColumnStore data) out to the storage subsystem. So let’s enable Archival Compression on every table.
ALTER INDEX ccsi ON H_Customer REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX ccsi ON H_LineItem REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX ccsi ON H_Nation REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX ccsi ON H_Order REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX ccsi ON H_Part REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX cssi ON H_PartSupp REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX ccsi ON H_Region REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO ALTER INDEX cssi ON H_Supplier REBUILD WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) GO
After the rebuild of every ColumnStore Index I checked the size of the tables again: around 49 GB! So the Archival Compression gave me around 11 GB of additional space savings – nice! The following picture gives you an overview of the detailed space savings on every table. You can also download a larger copy of that picture here.
As I always say in my SQL Server workshops: the ColumnStore Index in SQL Server is just a turbo-booster! USE IT! Since SQL Server 2014 there is no reason why you shouldn’t use it. And the Clustered ColumnStore Index gives you a *huge* space saving over traditional Clustered Indexes and/or Heap Tables.
Of course exactly how much space you can gain depends on your data. Just try it with your database in a test environment – I think you will be amazed at what you can achieve here.
Thanks for your time,