Klaus Aschenbrenner - Thursday, December 01, 2011
A life between bits & bytes RSS 2.0
 Thursday, December 01, 2011

A few days ago, one of my customers asked if there is a possibility to get a notification from SQL Server when a query has to wait for a longer time, because an incompatible lock exists on the same object. The answer to this question was easy, because SQL Server provides you for this functionality the so-called Blocked Process Report. Unfortunately this functionality is not enabled by default. Therefore I want to give you a short overview about the Blocked Process Report in this weblog posting, and you can use it to further troubleshoot locking scenarios inside your SQL Server installation.

The Blocked Process Report itself is a simple event that you can trace through SQL Server Profiler or SQL Trace. The event is called Blocked Process Report and you can find it within the event group Errors and Warnings:

But that event is only raised, when you enable the Blocked Process Report functionality on the SQL Server instance level through the sp_configure option blocked process threshold. That option just accepts a number between 0 and 86400 and is the number of seconds that a query must wait for a lock until SQL Server raises the Blocked Process Report event. By default that configuration option has a value of 0 which means that this event is never raised. The following code sets the threshold value to 10 seconds:

sp_configure 'blocked process threshold', 10

RECONFIGURE

GO

To demonstrate the Blocked Process Report, I'm just creating a new transaction inside the AdventureWorks2008R2 database through an UPDATE statement:

BEGIN TRANSACTION

 

UPDATE Person.Person

SET Title = 'Mr'

WHERE BusinessEntityID = 1

After the execution of this statement, the query has now acquired an Exclusive Lock (X) on the record where the column BusinessEntityID is equal to 1. In a second session I'm now trying to read the same record. During the reading SQL Server tries to acquire a Shared Lock (S) which leads to a blocking scenario:

SELECT * FROM Person.Person

WHERE BusinessEntityID = 1

GO

When you have started SQL Server Profiler and when you have configured the Blocked Process Report event, you will see that the event is reported about after 10 seconds:

As you can see from the screenshot, the Blocked Process Report itself is just XML data, so it is very easy to further analyze it, when you are familiar with XML and XQuery.There are 2 important nodes of XML Data - <blocked-process> and <blocking-process>. The first one - <blocked-process> describes the session that was blocked. In our case this was the second session that issued the SELECT statement against the AdventureWorks2008R2 database. The most important thing here is the XML attribute waitresource, which contains the locked resource on which the session was waiting and exceeding the Blocked Process Threshold configuration option.

The second node <blocking-process> describes the session that currently holds the incompatible lock on the resource, on which the other session wants to acquire the lock. The most important part here is the XML element <inputbuf> which shows the SQL statement that acquired the incompatible lock. With that information in your hand it is very easy to further troubleshoot why the Blocking Threshold was exceeded and how you can continue on that (like killing the other session, when it is an orphaned transaction).

The most important thing that you have to remember when you are working with the Blocked Process Report is the fact that SQL Server just generates that XML report, SQL Server WILL NOT resolve the locking/blocking scenario for you! In our case this means that the second session with the SELECT statement will be also running and waiting after SQL Server has raised the Blocked Process Report – SQL Server will never kill here a session – SQL Server just reports that one session has exceeded the Blocked Process Threshold – nothing more.

Thanks for reading

-Klaus

Thursday, December 01, 2011 7:43:51 PM (Westeuropäische Zeit, UTC+00:00)  #    Comments [0] - Trackback
.NET German | SQLServer | SQLServerPedia
 Wednesday, November 23, 2011

I have a lot of customers with some impressive SQL Server workloads, those databases are several hundred GB large, with several hundred simultaneous users – just great environments for performance tuning and troubleshooting. Some weeks ago I thought about how I can simulate such workloads in my own environment, so that I can make my performance troubleshooting sessions and workshops more realistic and impressive. I'm not the big fan of using the (small) AdventureWorks database with some prepared queries for real performance troubleshooting. Unfortunately no one of my customers wants to share their workload and their specific problems with me and the whole other world in public ;-)

Because of that I was looking in the last days for some free tools, with which I can simulate large OLTP workloads with hundreds of different users, and where I'm also able to scale out the workload to thousands of users – if I want and I have the requirement to do it. During the weekend I finally found one very interesting tool that is free and simulates a whole TPC-C workload: it's called Hammerora and it is a free Open Source Oracle Load Test Tool that simulates an OLTP based TPC-C workload. Yes, you read correct: the tool is written for Oracle, but fortunately they also support SQL Server! You can find the tool at http://hammerora.sourceforge.net. In this weblog posting I want to give you a quick overview about the usage of the tool, and how easy it is to run a whole TPC-C workload on your own SQL Server within a few minutes.

After the installation of the tool, you can start it through the provided batch file hammerora.bat, which is stored in the installation directory.

When you want to simulate a TPC-C based workload, you have to do 2 different things:

  • Creating the necessary database with the initial data
  • Run the TPC-C against the created database

Let's have a more detailed look on both of these steps. Before you can create the actual database, you have to tell the tool with which database system you are working. Hammerora supports the following database systems:

  • Oracle
  • MySQL
  • Microsoft SQL Server

You can set your actual database through the menu option Benchmark/Benchmark Options:

When you are working with Oracle or MySQL, Hammerora also supports a TPC-H (Data Warehouse) based workload. After you have set your database system to SQL Server, you can go to the menu Benchmark/TPC-C/TPC-C Schema Options.

This dialog allows you to set all the necessary options that are needed by Hammerora to create the actual TPC-C database schema. In the first part of the dialog you can specify your SQL Server connection info, and in the following part you can define the number of warehouses (parameter Number of Warehouses) that Hammerora creates in the database. You can find more detailed information about the database structure on the official TPC website at http://tpc.org/tpcc/detail.asp, which also describes the concept of a warehouse that is used by the TPC-C benchmark. With the parameter Virtual User to Build Schema you can specify how many sessions Hammerora should use during the creation of your database. You can think of a Virtual User as a session in SQL Server. This option allows you to create your TPC-C database in parallel with simultaneous sessions. When you have specified the necessary configuration options, you are ready to create your database. You just have to hit the Create TPC Schema symbol in the toolbar (12th symbol from left.). The creation of one warehouse needs around 130MB in the database. So it's also very easy to create a database with several hundred GBs of data in it – just increase the number of warehouses, but trust – it will take some time! J

After the creation of the TPC-C database, you have to configure which kind of test suite Hammerora has to execute against your database. The test suite itself is implemented as a so-called Driver Script. When you go again to the menu Benchmark/TPC-C/TPC-C Schema Options, you can see that Hammerora supports 2 kinds of driver scripts:

  • Standard Driver Script
  • Timed Test Driver Script

The Standard Driver Script just executes a continuous workload against your SQL Server database: when one query is finished the next query is submitted without any pausing. Your SQL Server will be hammered with a continuous workload, and how can see how far your SQL Server installation will scale. With the Timed Test Driver Script Hammerora introduces so-called Keying- and Thinking Times, which are also part of the official TPC-C benchmark. With this option, Hammerora submits about 3 queries per minute to your SQL Server, because Hammerora tries to simulate real users, which have to key in their data (Keying Time) and need some time for thinking (Thinking Time). With the Timed Test Driver Script you need a very high amount of Virtual Users to get a reasonable workload on your SQL Server. Further information about the various options that the Timed Test Driver Scripts accepts can be also found in the documentation of Hammerora.

For my scenarios I have just used the Standard Driver Scripts, because I want to have a continuous workload against SQL Server, so that I can demonstrate various performance tuning and troubleshooting techniques. When you have selected the corresponding driver script, you have to generate the actual script that is used for creating the workload against SQL Server. You can generate that script by using the menu option Benchmark/TPC-C/TPC-C Driver Script. When you have selected that option, Hammerora shows the driver script as an output in the main window:

The only thing that you finally have to configure is the amount of Virtual Users that Hammerora uses to execute the Driver Script against your SQL Server database. You can set the amount of Virtual Users through the menu option Virtual Users/Vuser Options:

As a last step you need to create those Virtual Users inside Hammerora by using the symbol Create Virtual Users (10th symbol from left) from the toolbar. You can see the created Virtual Users in the middle part of the main window. Finally you can use the symbol Run Hammerora Loadtest (11th symbol from left) from the toolbar to start the TPC-C workload against SQL Server. It takes some seconds until Hammerora has created all the necessary sessions, but then the actual TPC-C workload is executed against SQL Server:

As you can see from this weblog posting, it is very easy to create an OLTP based workload that is executed against SQL Server. With this approach it is also very easy to demonstrate some key performance problems inside SQL Server like incorrect Transaction Log usage, Locking/Blocking scenarios, Memory Management, misconfigured I/O subsystems etc.

On the other hand you can also demonstrate some other advanced concepts like Clustering, Database Mirroring, Log Shipping, Replication etc. with real workloads, and how some features affects the availability of your database. The possible scenarios are just endless, and you are just working with a real OLTP workload which can scale very easily to thousands of concurrent users, if you want and you have the resources… I think this approach can give our conference sessions and workshops a new feeling when we demonstrate SQL Server concepts and performance troubleshooting J

Thanks for reading

-Klaus

Wednesday, November 23, 2011 7:38:50 PM (Westeuropäische Zeit, UTC+00:00)  #    Comments [1] - Trackback
.NET German | SQLServer | SQLServerPedia
 Thursday, November 17, 2011

I'm very proud to announce today that I'm running next year a new event series across Europe: the SQL Server 2012 Deep Dive Days. You can learn in this 3 day event all about the new things in SQL Server 2012. The event series is currently planned for the following cities around Europe:

  • March 3 – 5, 2012 in Vienna/Austria
  • March 12 – 14, 2012 in Munich/Germany
  • March 19 – 21, 2012 in Zürich/Switzerland
  • April 16 – 18, 2012 in Cologne/Germany

You can find further detailed information at http://www.sqlpassion.at/events.html where you can also register for the various locations. If you are not from the German speaking area of Europe, I'm also planning 2 events in English in May 2012 at the following locations:

  • Brussels/Belgium
  • London/United Kingdom

In this 3 days long event I'm covering the following topics about SQL Server 2012:

Day 1: SQL Server 2012 Development

  • Project Juneau
  • T-SQL Enhancements
  • Spatial Data Types
  • File Table
  • Fulltext Indexes
  • Semantic Search
  • Service Broker
  • SQL Express Edition - LocalDB
  • Contained Databases
  • Metadata Discovery
  • SQL Server Integration Services

Day 2: SQL Server 2012 Administration

  • Extended Events
  • Columnstore Indexes
  • LOB Online Index Rebuilds
  • Adding Online Non-NULL Values
  • Custom Server Roles
  • Resource Governor
  • Database Recovery Advisor
  • Distributed Replay
  • Memory Management
  • Partitioned Tables
  • Troubleshooting Enhancements
  • Windows Server Core Support
  • Windows Server 8 & SQL Server 2012

Day 3: SQL Server 2012 High Availability

  • Failover Clustering Enhancements
    • Local TempDb
    • Health Check Policies
    • Failover Policy Levels
    • Indirect Checkpoints
  • AlwaysOn
    • Availability Groups
    • Availability Group Listeners
    • Monitoring & Troubleshooting
  • Multiple Secondaries
    • Automatic Failovers
    • Read-Only Connections
    • Backups

If you are interested in attending one of these events, don't hesitate and register at http://www.sqlpassion.at/events.html. There is also an Early-Bird price available until the end of this year.

BTW: if you have missed the news, Microsoft released a few hours ago the RC0 version of SQL Server 2012 – you can download the bits here: http://www.microsoft.com/download/en/details.aspx?id=28145

Thanks for reading!

-Klaus

Thursday, November 17, 2011 5:51:00 PM (Westeuropäische Zeit, UTC+00:00)  #    Comments [0] - Trackback
.NET German | Conferences | SQLServer | SQLServerPedia
 Friday, November 04, 2011

Today I want to talk about a phenomenon regarding memory management in SQL Server. A few weeks ago I had a consulting customer where we made some SQL Server performance improvements. As a side note the customer asked me, why SQL Server isn't using as much memory as possible on their production system. Their database was about 500 GB, and when we looked into the Task Manager of Windows, we saw that SQL Server was just consuming a few hundred MBs:

In the first step this behavior of SQL Server was a little bit amazing, but the answer was found very fast, when we looked into the details of their production configuration. The problem was that the customer granted the Locked Pages in Memory privilege to the service account under which SQL Server was running. You can check if you have granted this privilege when you look into the current SQL Server Error Log. If you have granted that privilege, you can see following informational message during the startup of SQL Server:

In the first step of this posting I want to explain why and when you should use that privilege for SQL Server, and in the second step you will see, how SQL Server uses that privilege and why Task Manager is lying to you regarding the memory consumption.

Every time when the Windows OS gets into memory pressure, the Windows OS raises a so-called Memory Resource Notification Event to all processes that are currently running on the box (see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366541(v=vs.85).aspx for the corresponding API function). SQL Server subscribes to this event and processes it internally through a component called the Resource Monitor. Additional information about the Resource Monitor component is reported through the Ring Buffer type RING_BUFFER_RESOURCE_MONITOR that is stored inside the DMV sys.dm_os_ring_buffers. When SQL Server receives the Memory Resource Notification Event, SQL Server must trim its internal caches, like the Buffer Pool, or the Plan Cache to get out of memory pressure. But there are some scenarios in which SQL Server reacts to slow to memory pressure. In that case the Windows OS will do a so-called Hard Working Set Trim, and pages the process (in our case SQL Server) out to the page file. The Windows OS just want to survive and get out of its memory pressure. Imagine that – the Buffer Pool, the Plan Cache – In-Memory buffers – are paged out to the page file! Accessing the Buffer Pool – our in-memory cache - means reading pages from the physical disk! You can imagine that the performance of SQL Server will decrease massively…

Another case when a Hard Working Set Trim can occur is, when you have device drivers that have bugs, or when the Windows OS has bugs that leads to Hard Working Set Trims. Under Windows Server 2003 there were several bugs reported that caused Hard Working Set Trims, like:

  • Copying large files from the SQL Server box
  • Establishing a Remote Desktop Connection to your SQL Server box

But how you can find out if your performance problems occur because of Hard Working Set Trims? When you are running a version of SQL Server prior to SQL Server 2005 SP2, you have to monitor the overall performance of your SQL Server instance, there are no messages or indications inside SQL Server that can tell you, if a Hard Working Set Trim degraded the performance of SQL Server. Beginning with SQL Server 2005 SP2 Microsoft has added an error message to the SQL Server Log, as soon as a Hard Working Set Trim occurred, like:

" A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 1086400, committed (KB): 2160928, memory utilization: 50%"

In such cases you can enable the Locked Pages in Memory privilege to get rid of this issue. In this case the Windows OS is not allowed to page out pages that SQL Server allocated for the Buffer Pool. Yes, you read correct: Locked Pages in Memory is only used for Buffer Pool allocations inside SQL Server!

But on the other hand you also have to investigate WHY a Hard Working Set Trim occurred on your system. A Hard Working Set Trim is just the result of another problem that you have on your Windows box. So you should find the root cause, and eliminate it, instead of using the shortcut (enabling Locked Pages in Memory) infinitely. I know a lot of DBAs who are using Locked Pages in Memory by default, and there are several pro and cons for this which I don't want to discuss here again. Glen Berry has a great article which describes more of those pros and cons (see http://sqlserverperformance.wordpress.com/2011/02/14/sql-server-and-the-lock-pages-in-memory-right-in-windows-server). One thing that I want to mention here again is the fact that you should limit the memory that SQL Server can allocate for the Buffer Pool when you use Locked Pages in Memory. You can limit the memory through the Max Memory Setting of your SQL Server instance.

By now you know when and how you can enable Locked Pages in Memory for your SQL Server instance. But why the heck Task Manager is reporting a wrong Working Set size when SQL Server is using that privilege. For this explanation we have to dig a little bit deeper into the Win32API. By default (without Locked Pages in Memory) SQL Server allocates memory for the Buffer Pool through the VirtualAlloc function of the Win32API (see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx). But when the SQL Server service account has the Locked Pages in Memory privilege, SQL Server internally uses the AllocateUserPhysicalPages Win32API function (see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366528(v=vs.85).aspx) to do Buffer Pool allocations. This function can be only called by a process, when the process has the SeLockMemoryPrivilege – in other words the Locked Pages in Memory privilege which is the user-friendly name. The Win32API function AllocateUserPhysicalPages is part of the Address Windowing Extensions API (AWE). Therefore those memory allocations can't be paged out by the Windows OS, because AWE memory regions are not page able by design. Those memory regions are just locked in memory, therefore the name of this privilege. When you have Locked Pages in Memory for your SQL Server service account enabled, it also means that you are using AWE indirectly – funny isn't it? As you can see from this explanation, Locked Pages in Memory is rather a Windows OS concept than a SQL Server concept. It has nothing to do directly with SQL Server. SQL Server is just a consumer of it.

With this basic knowledge in your hand, it is now very easy to explain why Task Manager doesn't show the correct Working Set of the SQL Server process: Task Manager is not reporting those memory allocations that are done through the Win32API function AllocateUserPhysicalPages – that's all about this phenomenon.

But how can you now find out, how much memory SQL Server is really using? There are several possibilities. Inside SQL Server you can use the DMV sys.dm_os_process_memory and the column physical_memory_in_use_kb. This returns you the physical memory that SQL Server is currently using including AWE memory allocations. If you want to have a more detailed breakdown of the memory consumption of SQL Server, you can use the DMV sys.dm_os_memory_clerks. A memory clerk is a component inside SQL Server, which tracks memory allocations for a specific component inside SQL Server. SQL Server gives you a memory clerks for each major component for each available NUMA node in your system. The column awe_allocated_kb shows you the AWE memory allocations in kb that were allocated by SQL Server through the Win32API function AllocateUserPhysicalPages.

Update:
Aaron Bertrand (see http://sqlblog.com/blogs/aaron_bertrand) mentioned that it would also make sense to reference a Performance Monitor counter to track the memory consumption of SQL Server. So here it is:

Outside of SQL Server you can use the Performance Monitor counter SQLServer:Memory Manager/Total Server Memory (KB) to track how large the Buffer Pool currently is.

As you have seen with this posting, memory management inside SQL Server is a really complicated topic, and we just have touched the tip of the iceberg. So don't trust Task Manager blindly regarding the memory consumption of SQL Server, you really have to know how SQL Server is configured to get the correct picture.

Thanks for reading

-Klaus

Friday, November 04, 2011 9:20:13 PM (Westeuropäische Zeit, UTC+00:00)  #    Comments [1] - Trackback
.NET German | Conferences | SQLServer | SQLServerPedia
 Wednesday, November 02, 2011

On October 18, 2011 I have made unfair and incorrect words on Twitter, Facebook, LinkedIn, and Xing against the owners of SQLWorkshops.com. Because of this misbehavior of me, I got a so-called dissuasion letter from SQLWorkshops.com, which I have signed today, because I'm committing me now not to make such unfair and incorrect words any more in the future against SQLWorkshops.com.

I also want to take the chance and correct my unfair and incorrect statements of October 18, 2011. It was incorrect and wrong by me that I have blamed SQLWorkshops.com for the following things:

  • Unfair Business Conduct ("Some people are not playing fair")
  • Sending of threatening emails and letters ("They wrote me and Adam Machanic hard threating emails, because we were presenting on memory spills at sqlpass. At least I and Adam Machanic got a sqlsue email, because we have presented on memory spills at sqlpass.")
  • Organizing and running workshops of bad quality and of oversized prices ("They are charging horrible amounts for their workshops. Talked to some customers and they really suck. The attendees are not really amazed. Content is not really that bright…")
  • They have made pressure against me to remove my query memory spills demo from my various session downloads.

I apologize for any inconveniences that I have made with these unfair and incorrect statements.

-Klaus

Wednesday, November 02, 2011 4:16:15 PM (Westeuropäische Zeit, UTC+00:00)  #    Comments [0] - Trackback
.NET German | Conferences | SQLServer | SQLServerPedia
 Tuesday, October 25, 2011

Today I want to talk about a specified wait type that can be sometimes very hard to troubleshoot: the THREADPOOL wait type. This wait type is specific to the internal thread scheduling mechanism inside SQL Server.

As you might know SQL Server internally uses its own "operating system" to implement thread scheduling and memory management – the SQLOS. The SQLOS provides a set of worker threads that are used to execute queries that are submitted to SQL Server. The problem is that those worker threads can be exhausted sometimes - maybe because of a Locking/Blocking scenario. In this case SQL Server isn't able to execute any more requests inside the engine, because no free worker threads are available any more.

You can configure through the max worker threads option (through sp_configure) how many worker threads are available to SQLOS. By default the value of this option is 0, which means SQL Server itself decides how many worker threads are used. The number of the available worker threads depends on the processor architecture (x32, x64) and the number of CPUs that you have available. Books Online (see http://msdn.microsoft.com/en-us/library/ms187024.aspx) has the following table that describes the various possible combinations:

Number of CPUs

x32

x64

<= 4 Processors

256

512

8 Processors

288

576

16 Processors

352

704

32 Processors

480

960

 

You can also check through the column max_workers_count in sys.dm_os_sys_info how many worker threads your SQL Server instance is using. With the following example I want to demonstrate now how you can get thread starvation in SQL Server and how you can resolve it.

CAUTION: DON'T DO THE FOLLOWING STEPS ON A PRODUCTION SYSTEM!!!

In the first step we create a new database and a simple table for our sample scenario. I want to be unique as possible; therefore I use unique table and column names ;-)

USE master

GO

 

CREATE DATABASE ThreadPoolWaits

GO

 

USE ThreadPoolWaits

GO

 

-- Create a new test table (this one will be unique on earth - hopefully...)

CREATE TABLE [SomeCrazyUniqueTableName_6EFF088F-443B-4EBC-A4C7-9FC146D2EE49]

(

[MyUniqueColumnName1_F67DAC4A-C202-49BB-829A-071130BF1160]

INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,

[MyUniqueColumnName2_438B7184-B476-48A4-B5FA-DC34B99FA0A4]

INT

)

GO

 

-- Insert a record

INSERT INTO [SomeCrazyUniqueTableName_6EFF088F-443B-4EBC-A4C7-9FC146D2EE49]

VALUES (1)

GO

As you can see from the previous listing, our table definition is very simple. In the next step I'm creating a new stored procedure that encapsulates some read workload inside that database.

-- Create a stored procedure that encapsulates a read workload

CREATE PROCEDURE MyCustomUniqueStoredProcedureName_ReadWorkload

AS

BEGIN

SELECT * FROM [SomeCrazyUniqueTableName_6EFF088F-443B-4EBC-A4C7-9FC146D2EE49]

END

GO

Finally we are beginning a new transaction, making an update to the previous created table, and never committing that transaction:

-- Begin a transaction that never commits...

BEGIN TRANSACTION

UPDATE [SomeCrazyUniqueTableName_6EFF088F-443B-4EBC-A4C7-9FC146D2EE49]

WITH (TABLOCKX)

SET [MyUniqueColumnName2_438B7184-B476-48A4-B5FA-DC34B99FA0A4] = 2
GO

By now you have a pending transaction inside your SQL Server instance that holds an exclusive table lock because of the TABLOCKX query hint. When you now execute the previous created stored procedure from a different session, the stored procedure is waiting because it needs to acquire a Shared lock for reading the record:

EXEC MyCustomUniqueStoredProcedureName_ReadWorkload

GO

You can also check this Locking/Blocking scenario through the DMV sys.dm_db_tran_locks, which will show you a waiting request:

SELECT

resource_associated_entity_id,

request_mode,

request_status,

request_session_id

FROM sys.dm_tran_locks

WHERE resource_database_id = DB_ID('ThreadPoolWaits')

AND resource_type = 'OBJECT'

GO

In this simple scenario with just one waiting query inside SQL Server, nothing special occurs. But how will SQL Server react when you use a massive amount of queries that is larger than the possible max worker threads? Let's try it. I'm using for this task the ostress.exe utility that is part of the RML Tools that are provided free by Microsoft (see here). In my configuration (x64, 8 CPUs) SQL Server uses internally 576 worker threads. So I'm simulating with ostress.exe 600 concurrent connections to SQL Server through the following command line:

ostress.exe
-Q"EXEC ThreadPoolWaits.dbo.MyCustomUniqueStoredProcedureName_ReadWorkload"
-n600

When you execute that command prompt, it takes a few seconds until ostress.exe has created the 600 worker threads, and nothing special happens. Seems so far so good. Let's now analyze the situation and create a new connection through SQL Server Management Studio to your SQL Server instance. Oops, the connection can't be made:

SQL Server isn't responding anymore!!! This makes sense, because we have now exhausted the maximum available worker threads. Almost all submitted requests to SQL Server are currently waiting for a Shared Lock (LCK_M_S wait type), and all the other ones can't be enqueued inside SQL Server because no worker threads are available anymore (THREADPOOL wait type). But how can we troubleshoot that scenario now? Restarting SQL Server isn't always really an option…

Fortunately Microsoft provides since SQL Server 2005 the so-called Dedicated Admin Connection (DAC). With this connection you are able to log into SQL Server even when you have worker thread starvation or high memory pressure, because the DAC has its own

  • Scheduler
  • Memory Node
  • TCP Port

inside SQLOS. Therefore SQL Server is able to accept and serve the DAC connection – even in high sophisticated troubleshooting scenarios like this one. But there is only one available DAC for the whole SQL Server instance, which must be also taken into account! When you want to connect through the DAC, you have to use the following syntax: admin:<servername> where <servername> is the name of your SQL Server instance. So let's start up a new instance of SQL Server Management Studio and log into SQL Server through the DAC. Please be aware that you don't connect the Object Explorer through the DAC, because the DAC isn't supported for the Object Explorer. You can only use a simple query window that connects through the DAC:

When you have successfully connected through the DAC, you are now able to run your diagnostic queries. You must be also aware that the DAC doesn't support Auto Completion, because Auto Completion uses its own connection in the background – so you have to know the DMVs you want to use for troubleshooting J. In the first step we can check sys.dm_exec_requests which requests are currently waiting inside SQL Server:

SELECT

r.command,

r.sql_handle,

r.plan_handle,

r.wait_type,

r.wait_resource,

r.wait_time,

r.session_id,

r.blocking_session_id

FROM sys.dm_exec_requests r

INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id

WHERE s.is_user_process = 1

GO

In my configuration this query returns me 547 rows, which means 547 user requests (WHERE s.is_user_process = 1) are currently waiting inside SQL Server. But we have started our stored procedure with 600 concurrent users… sys.dm_exec_requests shows you only those requests that have an underlying worker thread inside SQL Server, because those requests are currently executing inside SQL Server. But where are the others? Those other pending requests are only accessible through sys.dm_os_waiting_tasks – they have a wait type of THREADPOOL:

SELECT * FROM sys.dm_os_waiting_tasks

WHERE wait_type = 'THREADPOOL'

GO

 

They are just waiting until a new thread from the worker pool gets free. But in our scenario every thread is currently suspended and bound to a user request, therefore those requests will wait forever! You can also see the THREADPOOL wait type only inside sys.dm_os_waiting_tasks and never in sys.dm_exec_requests, because a request in sys.dm_exec_requests is already bound to a worker thread inside SQL Server. When you look back to the output of sys.dm_exec_requests you can also see the columns session_id and blocking_session_id at the end of the result set. Those 2 columns are showing you the blocking chain inside SQL Server:

As you can see almost every session has a blocking_session_id of 56, and the session_id 56 has a blocking_session_id of 52. The session_id 52 is our head blocker! Let's further analyze the session of the head blocker:

-- Analyze the head blocker session

SELECT

login_time,

[host_name],

[program_name],

login_name

FROM sys.dm_exec_sessions

WHERE session_id = 52

GO

 

-- Analye the head blocker connection

SELECT

connect_time,

client_tcp_port,

most_recent_sql_handle

FROM sys.dm_exec_connections

WHERE session_id = 52

GO

 

The most interesting column is here most_recent_sql_handle from sys.dm_exec_connections which we can use to retrieve the executed SQL statement. When you use the DMF sys.dm_exec_sql_text and pass in the value of the most_recent_sql_handle column you are able to retrieve the executed SQL statement:

SELECT [text] FROM sys.dm_exec_sql_text(0x01001A0015BE5D3170CC4483000000000000000000000000)

GO

 

This SELECT statement will return you the following string:

-- Begin a transaction that never commits... BEGIN TRANSACTION UPDATE [SomeCrazyUniqueTableName_6EFF088F-443B-4EBC-A4C7-9FC146D2EE49] WITH (TABLOCKX) SET [MyUniqueColumnName2_438B7184-B476-48A4-B5FA-DC34B99FA0A4] = 2

This is our initial query where we left our transaction open. By now we have tracked down the problematic query that had leaded to THREADPOOL and LCK_M_S waits, and finally we can kill that session through the DAC:

KILL 52

GO

Now it takes several seconds until the session is killed, and finally our blocking scenario is gone. By now your SQL Server is again able to accept new connections and will work in the usual way. When you are done with troubleshooting through the DAC connection, don't forget to close that special connection, because there is only one DAC available for the whole SQL Server instance! When you afterwards look into the SQL Server Error Log, you will also see a message like the following one:

New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 1680 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 96%.

This is also an indication that you had worker thread starvation inside your SQL Server instance. As a side-effect this scenario has also leaded to so-called Deadlocked Schedulers, which Amit Banerjee describes in more detail here. When SQL Server encounters Deadlocked Schedulers, SQL Server will write out a Stack Dump to your SQL Server LOG directory. You can also see a Deadlocked Scheduler inside the SQL Server Error Log:

Sometimes I see customers which just blindly reconfigure the max worker threads setting inside SQL Server, because they think they need more worker threads for their workload. But as with almost every problem in SQL Server, there is some root cause which has leaded to the problem that you are currently seeing. In our scenario the root cause was an uncommitted transaction, which leaded to a blocking scenario, which leaded to thread starvation, which finally leaded to an unresponsive SQL Server. As you can see from this explanation, there could be a very long chain until you find your root cause – so keep that in mind for your next troubleshooting scenarios.

To make it easy for you to reproduce that special scenario, you can download the needed scripts from here.

Thanks for reading!

-Klaus

Tuesday, October 25, 2011 10:03:49 PM (Westeuropäische Sommerzeit, UTC+01:00)  #    Comments [2] - Trackback
.NET German | SQLServer | SQLServerPedia
 Wednesday, October 19, 2011

When you sometimes look at Execution Plans, you can see that the SELECT operator has sometimes a so-called Memory Grant assigned. This Memory Grant is specified in kilobytes and is needed for the query execution, when some operators (like Sort/Hash operators) in the Execution Plans need memory for execution – the so called Query Memory.

This query memory must be granted by SQL Server before the query is actually executed. The Query Optimizer uses the underlying Statistics to determine how much Query Memory must be acquired for a given query. The problem is now, when the Statistics are out-of-date, and SQL Server underestimates the processed rows. In this case, SQL Server will also request to less Query Memory for the given query. But when the query actually executes, the query can't resize its granted Query Memory, and can't just request more. The query must operate within the granted Query Memory. In this case, SQL Server has to spill the Sort/Hash-Operation into TempDb, which means that our very fast in-memory operation becomes a very slow physical On-Disk operation. SQL Server Profiler will report those Query Memory Spills through the events Sort Warnings and Hash Warning.

Unfortunately SQL Server 2008 (R2) provides you no events through Extended Events to track down those Memory Spills. In SQL Server 2012 this will change, and you will have additional events available inside Extended Events for troubleshooting this problem. In this posting I will illustrate you with a simple example how you can reproduce a simple Query Memory Spill because of out-of-date statistics. Let's create a new database and a simple test table inside it:

SET STATISTICS IO ON

SET STATISTICS TIME ON

GO

 

-- Create a new database

CREATE DATABASE InsufficientMemoryGrants

GO

 

SUE InsufficientMemoryGrants

GO

 

-- Create a test table

CREATE TABLE TestTable

(

Col1 INT IDENTITY PRIMARY KEY,

Col2 INT,

Col3 CHAR(4000)

)

GO

-- Create a Non-Clustered Index on column Col2

CREATE NONCLUSTERED INDEX idxTable1_Column2 ON TestTable(Col2)

GO

The table TestTable contains the primary key on the first column, and the second column is indexed through a Non-Clustered Index. The third column is a CHAR(4000) column which isn't indexed. We will use that column afterwards for an ORDER BY, so that the Query Optimizer must generate an explicit Sort Operator inside the Execution Plan. In the next step I'm just inserting 1500 records, where we have an even data distribution across all the values in the second column – each value exists once in our table.

With that test data prepared we can now execute a simple query, which must use a separate Sort operator in the Execution Plan:

DECLARE @x INT

 

SELECT @x = Col2 FROM TestTable

WHERE Col2 = 2

ORDER BY Col3

GO

 

This query uses the following Execution Plan:

When you look into SQL Server Profiler and you have enabled the above mentioned events, nothing happens. You can also use the DMV sys.dm_io_virtual_file_stats and the columns num_of_writes and num_of_bytes_written to find out if there was some activity in TempDb for a given query. This works – of course only – when you are the only person who currently uses the given SQL Server instance:

-- Check the activity in TempDb before we execute the sort operation.

SELECT num_of_writes, num_of_bytes_written FROM

sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

 

-- Select a record through the previous created Non-Clustered Index from the table.

-- SQL Server retrieves the record through a Non-Clustered Index Seek operator.

-- SQL Server estimates for the sort operator 1 record, which also reflects

-- the actual number of rows.

-- SQL Server requests a memory grant of 1024kb - the sorting is done inside

-- the memory.

DECLARE @x INT

 

SELECT @x = Col2 FROM TestTable

WHERE Col2 = 2

ORDER BY Col3

GO

 

-- Check the activity in TempDb after the execution of the sort operation.

-- There was no activity in TempDb during the previous SELECT statement.

SELECT num_of_writes, num_of_bytes_written FROM

sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

Again you will see no activity in TempDb, which means the output from sys.dm_io_virtual_file_stats is the same before and after executing the query. The query takes on my system around 1ms of execution time.

Now we have a table with 1500 records, means that our table needs 20% + 500 rows of data changes so that SQL Server will update the statistics. If you're doing the math, we need 800 data modifications in that table (500 + 300). So let's just insert 799 additional rows where the value of the second column is 2. We are just changing the data distribution and SQL Server WILL NOT update the statistics, because one additional data change is still missing, until Update Statistics is triggered automatically inside SQL Server!

-- Insert 799 records into table TestTable

SELECT TOP 799 IDENTITY(INT, 1, 1) AS n INTO #Nums

FROM

master.dbo.syscolumns sc1

 

INSERT INTO TestTable (Col2, Col3)

SELECT 2, REPLICATE('x', 4000) FROM #nums

DROP TABLE #nums

GO

When you now execute the same query again, SQL Server will now spill the Sort operation to TempDb, because SQL Server will only request a Query Memory Grant of 1024 kilobytes, which is estimated for just 1 record – the memory grant has the same size as before:

-- Check the activity in TempDb before we execute the sort operation.

SELECT num_of_writes, num_of_bytes_written FROM

sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

 

-- SQL Server estimates now 1 record for the sort operation and requests a memory grant of 1.024kb for the query.

-- This is too less, because actually we are sorting 800 rows!

-- SQL Server has to spill the sort operation into TempDb, which now becomes a physical I/O operation!!!

DECLARE @x INT

 

SELECT @x = Col2 FROM TestTable

WHERE Col2 = 2

ORDER BY Col3

GO

 

-- Check the activity in TempDb after the execution of the sort operation.

-- There is now activity in TempDb during the previous SELECT statement.

SELECT num_of_writes, num_of_bytes_written FROM

sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

If you check the Estimated Number of Rows in the Execution Plan, they are differing completely from the Actual Number of Rows:

When you track the query execution time you will also see that the execution time increased – in my case it increased up to 200ms, which is a huge difference to the earlier execution time of just 1ms! The DMV sys.dm_io_virtual_file_stats will also report some activity inside TempDb, which is also the evidence that SQL Server spilled the Sort operation into TempDb! SQL Server Profiler will also show you a Sort Warning event.

If you now insert one additional record, and you run the query again, everything is fine, because SQL Server will trigger the Statistics Update and estimate the Query Memory Grant correctly:

-- Insert 1 records into table TestTable

SELECT TOP 1 IDENTITY(INT, 1, 1) AS n INTO #Nums

FROM

master.dbo.syscolumns sc1

 

INSERT INTO TestTable (Col2, Col3)

SELECT 2, REPLICATE('x', 2000) FROM #nums

DROP TABLE #nums

GO

 

-- Check the activity in TempDb before we execute the sort operation.

SELECT num_of_writes, num_of_bytes_written FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

 

-- SQL Server has now accurate statistics and estimates 801 rows for the sort operator.

-- SQL Server requests a memory grant of 6.656kb, which is now enough.

-- SQL Server now spills the sort operation not to TempDb.

-- Logical reads: 577

DECLARE @x INT

 

SELECT @x = Col2 FROM TestTable

WHERE Col2 = 2

ORDER BY Col3

GO

 

-- Check the activity in TempDb after the execution of the sort operation.

-- There is now no activity in TempDb during the previous SELECT statement.

SELECT num_of_writes, num_of_bytes_written FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

GO

So this is a very basic example which shows you how you can reproduce Sort Warnings inside SQL Server – not really a magic. Adam Machanic (http://sqlblog.com/blogs/adam_machanic) has done last week at the SQLPASS Summit in Seattle a whole session about Query Memory at a 500 level, where he went into more details on this complicated topic, especially in combination with Parallel Execution Plans.

Thanks for reading!

-Klaus

Wednesday, October 19, 2011 9:04:35 PM (Westeuropäische Sommerzeit, UTC+01:00)  #    Comments [0] - Trackback
.NET German | Conferences | SQLServer | SQLServerPedia
 Friday, October 14, 2011

As I have announced in my "Advanced SQL Server Troubleshooting" session at SQLPASS today, you can find here the slides & samples for download. Thanks for attending my session J

-Klaus

Friday, October 14, 2011 6:23:04 AM (Westeuropäische Sommerzeit, UTC+01:00)  #    Comments [0] - Trackback
.NET German | Conferences | SQLServer | SQLServerPedia
About the author/Disclaimer

Klaus Aschenbrenner provides independent SQL Server Consulting Services across Europe.

Klaus works with the .NET Framework and especially with the SQL Server 2005/2008 from the very early beginnings.

In the years 2004 - 2005 Klaus was entitled with the MVP award from Microsoft for his tremendous support in the .NET Community.

Klaus has also written the book Pro SQL Server 2008 Service Broker which was published by Apress in the Summer of 2008.



Contact
Klaus Aschenbrenner
Pichlgasse 16/6
A-1220 Vienna
Austria

© Copyright 2012
Klaus Aschenbrenner
Sign In
Archive
<December 2011>
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
Statistics
Total Posts: 241
This Year: 10
This Month: 2
This Week: 1
Comments: 145
Themes
Pick a theme:
All Content © 2012, Klaus Aschenbrenner
DasBlog theme 'Business' created by Christoph De Baene (delarou)