Choose your Hash Bucket Count very wisely in Hekaton!

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

Today I have encountered a very simple blog posting where someone was inserting 1 millions of rows into a Hekaton Hash Index of 2048 buckets. Therefore I was running some tests how Hash Collisions in the Hash Buckets will affect the throughput of your Hekaton workload – and the results are very, very interesting. In the first step I want to simple describe what a Hash Collision is.

As you might know (hopefully), Hekaton tables are implemented as Hash Indexes in SQL Server 2014. Wikipedia has a very nice page about the basics of Hash Tables in general, which are used as the basic foundation of Hash Indexes. In a nutshell, SQL Server is applying a Hash Function to the key value of your Hash Index, and the result of that Hash Function determines in which Hash Bucket your row will be stored finally. If multiple key values are hashed to the same value, SQL Server inserts multiple rows in that specific Hash Bucket, and just chains the multiple entries in that Hash Bucket. See the following picture (taken from Wikipedia):

HashCollision

As you can see from this picture, the keys “John Smith” and “Sandra Dee” are hashed to the same bucket – in our case, the bucket 152. This means that both rows are stored in the same Hash Bucket, which affects the INSERT performance, and also the SELECT performance of your queries. During the INSERT, SQL Server has to maintain the linked list, and during SELECT queries SQL Server has to scan the linked list. 

After that introduction, let’s demonstrate the performance impact of Hash Collisions with a simple example. Let’s create a database with a simple Hekaton table:

As you can see from the source code, I’m using here just a Hash Bucket count of 1024 – not too many buckets, when we insert afterwards 1 million of records. In the next step I’m also creating  a native compiled Stored Procedure, so that I can go wicked fast with Hekaton:

As you can see, I’m just inserting here 1 million of records in a simple loop. When I’m running this Stored Procedure in a VM with 1 CPU, and 2 GB RAM, the execution time is around 24 seconds – awefully slow. When you are doubling the Hash Bucket count over and over again till you have 1048576 buckets, you can see very nicely that the performance increases as you create more and more Hash Buckets. With 1048576 buckets, the execution of the same Stored Procedure just takes around 600ms! This is huge differentiation compared to the first test run with just 1024 buckets. You can also check with the DMV sys.dm_db_xtp_hash_index_stats how many buckets you have in your Hash Index.

So the moral of this story? Choose your Hash Bucket count for Hekaton Hash Indexes *very, very* wisely, as they are really impacting the performance of your workload! The optimal number of buckets is just the number of distinct values that you are storing in the Hash Index – with some additional free space, just to be on the safe side 🙂 You also have to make sure not to choose the bucket count too high, because otherwise you are just waisting memory. So again – as almost everything in SQL Server – it depends on your workload, besides Database Shrinks 😉

Thanks for reading

-Klaus

SQLpassion

Copyright © 2015 by SQLpassion · Klaus Aschenbrenner · Imprint · Offerings · Academy · Contact · Go to Top