Skip to content

How to Create a Type2 SCD (Slowly Changing Dimension)

This article could just as well be called creating a historical snapshot table. This type of table is also referenced as a dimension depending on what kind of data repository it’s located in. Personally, I prefer to keep a historical snapshot table in a normalized data store that contains history. This normalized data store is typically the first stopping point from the source system. It is useful because it keeps historical snapshots of what the data looked like in the source system at any point in time.

To get started, let’s create a history table we will use to store the historical values.

History Table for Person

From above, we see that we have 4 additional columns:

  • Person_HistoryID – this is a surrogate key specific to our new table.
  • ChkSum – contains a CHECKSUM of all the columns used compare data discrepencies.
  • DateTime_From – the beginning date in which this record is effective.
  • DateTime_To – the ending date in which this record is no longer effective.

First, let’s create our sample source table and populate it with some data

CREATE TABLE Person(
	PersonID		int				IDENTITY(1,1) NOT NULL,
	Title			nvarchar(8)		NULL,
	FirstName		nvarchar(50)	NOT NULL,
	MiddleName		nvarchar(50)	NULL,
	LastName		nvarchar(50)	NOT NULL,
	EmailAddress	nvarchar(50)	NULL,
	Phone			nvarchar(25)	NULL,
	ModifiedDate	datetime		NOT NULL
)

SET IDENTITY_INSERT [dbo].[Person] ON
INSERT [dbo].[Person] ([PersonID], [Title], [FirstName], [MiddleName], [LastName], [EmailAddress], [Phone], [ModifiedDate])
VALUES (1, N'Mr.', N'Gustavo', NULL, N'Achong', N'gustavo0@adventure-works.com', N'398-555-0132', CAST(0x000096560110E30E AS DateTime))
INSERT [dbo].[Person] ([PersonID], [Title], [FirstName], [MiddleName], [LastName], [EmailAddress], [Phone], [ModifiedDate])
VALUES (2, N'Ms.', N'Catherine', N'R.', N'Abel', N'catherine0@adventure-works.com', N'747-555-0171', CAST(0x000096560110E313 AS DateTime))
INSERT [dbo].[Person] ([PersonID], [Title], [FirstName], [MiddleName], [LastName], [EmailAddress], [Phone], [ModifiedDate])
VALUES (3, N'Ms.', N'Kim', NULL, N'Abercrombie', N'kim2@adventure-works.com', N'334-555-0137', CAST(0x000096560110E313 AS DateTime))
SET IDENTITY_INSERT [dbo].[Person] OFF

Next, let’s create our history table:

CREATE TABLE Person_History (
   Person_HistoryID     int                  IDENTITY(1,1) NOT NULL,
   PersonID             int                  NOT NULL,
   Title                nvarchar(8)          NULL,
   FirstName            nvarchar(50)         NOT NULL,
   MiddleName           nvarchar(50)         NULL,
   LastName             nvarchar(50)         NOT NULL,
   EmailAddress         nvarchar(50)         NULL,
   Phone                nvarchar(25)         NULL,
   ModifiedDate         datetime             NULL,
   ChkSum               int                  NULL,
   DateTime_From        datetime             NULL,
   DateTime_To          datetime             NULL,
   CONSTRAINT PK_Contact_ContactID PRIMARY KEY NONCLUSTERED (Person_HistoryID)
         ON "PRIMARY",
   CONSTRAINT AK_UC_PERSONID_DATETI_PERSON_H UNIQUE CLUSTERED (PersonID, DateTime_From)
)

Now that we have our tables created, let’s look at the script that will import the data and close out the old records. In the first step, we look for Person records that have changed. We do this by comparing the checksum of the active record stored in the history table with a checksum we dynamically calculate off of the source records. If the checksums do not match, then we close out the currently active record, making it inactive. We then insert a new record containing the new values and make it active.

DECLARE @PersonID_Updated	table (PersonID int);
DECLARE @DateTime_Inserted	datetime;
DECLARE @DateTime_Future	datetime;

SET @DateTime_Inserted	= GETDATE();
SET @DateTime_Future	= '2079-06-06';

/************************************
* close old record if it exists
* and a value has been changed
*************************************/
UPDATE ph
	SET DateTime_to		= @DateTime_Inserted
OUTPUT inserted.PersonID
INTO @PersonID_Updated
FROM Person_History ph
JOIN Person p
	ON ph.PersonID = p.PersonID
AND
ph.chksum !=
	CHECKSUM
	(
		 p.Title
		,p.FirstName
		,p.MiddleName
		,p.LastName
		,p.EmailAddress
		,p.Phone
		,p.ModifiedDate
	)
AND DateTime_to = @DateTime_Future

/************************************
* insert any new or closed
* out records
*************************************/

INSERT INTO Person_History
(
	 PersonID
	,Title
	,FirstName
	,MiddleName
	,LastName
	,EmailAddress
	,Phone
	,ModifiedDate
	,ChkSum
	,DateTime_From
	,DateTime_To
)
SELECT
	 PersonID
	,Title
	,FirstName
	,MiddleName
	,LastName
	,EmailAddress
	,Phone
	,ModifiedDate
	,ChkSum =
		CHECKSUM
		(
			 p.Title
			,p.FirstName
			,p.MiddleName
			,p.LastName
			,p.EmailAddress
			,p.Phone
			,p.ModifiedDate
		)
	,DateTime_From = @DateTime_Inserted
	,DateTime_To = @DateTime_Future
FROM Person p
WHERE
NOT EXISTS
(
	SELECT 1
	FROM Person_History
	WHERE PersonID = p.PersonID
)
OR
EXISTS
(
	SELECT 1
	FROM @PersonID_Updated
	WHERE PersonID = p.PersonID
)

Using DBCC CHECKIDENT to Reseed a Table After Delete

I imagine you are just looking for simple syntax in order to reseed the identity column of a table you just deleted from.

Here is the quick version:

DBCC CHECKIDENT('##reseed_example', RESEED, @max_seed)

And here is an extended example:

-- populate a table with identity
SELECT
	 ID = IDENTITY(int,1,1)
	,name
INTO ##reseed_example
FROM dbo.sysobjects

-- delete some records
DELETE FROM ##reseed_example
WHERE ID > 5

-- find the current max identity
DECLARE @max_seed int = ISNULL((SELECT MAX(ID) FROM ##reseed_example),0)

-- use the current max as the seed
DBCC CHECKIDENT('##reseed_example', RESEED, @max_seed)

-- let's test
INSERT INTO ##reseed_example
(
	name
)
SELECT
	'newobject'

-- done
SELECT *
FROM ##reseed_example

it should be noted that in order to use DBCC CHECKIDENT you need to be dbo (db_owner). This does present an issue sometimes because often the reason the user is performing a delete instead of a truncate is because they do not have dbo rights. To overcome this, you can will need to create a procedure that that uses: WITH EXECUTE AS ‘dbo’. And reseed from there.

CROSS APPLY Explained

My first introduction to the APPLY operator was using the DMVs. For quite a while after first being introduced, I didn’t understand it or see a use for it. While it is undeniable that it is has some required uses when dealing with table valued functions, it’s other uses evaded me for a while. Luckily, I started seeing some code that used it outside of table valued functions. It finally struck me that it could be used as a replacement for correlated sub queries and derived tables. That’s what we’ll discuss today.

I never liked correlated subqueries because it always seemed like adding full blown queries in the select list was confusing and improper.

SELECT
	 SalesOrderID			= soh.SalesOrderID
	,OrderDate				= soh.OrderDate
	,MaxUnitPrice			= (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID)
FROM AdventureWorks.Sales.SalesOrderHeader AS soh

It always seemed to me that these operations should go below the FROM clause. So to get around this, I would typically create a derived table. Which didn’t completely feel right either, but it was still just a bit cleaner:

SELECT
	soh.SalesOrderID
	,soh.OrderDate
	,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
JOIN
(
	SELECT
		max_unit_price = MAX(sod.UnitPrice),
		SalesOrderID
    FROM Sales.SalesOrderDetail AS sod
    GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID

What made this ugly was the need to use the GROUP BY clause because we could not correlate. Also, even though SQL almost always generates the same execution plan as a correlated sub query, there were times when the logic inside the derived table got so complex, that it would not limit the result set of the derived table by inferring the correlation first. This made this kind of query sometimes impractical.

Luckily, this is where the CROSS APPLY steps in so nicely. It gives us the best of both worlds by allowing us to correlate AND not have the query embedded in the select list:

SELECT
	soh.SalesOrderID
	,soh.OrderDate
	,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
	SELECT
		max_unit_price = MAX(sod.UnitPrice)
    FROM Sales.SalesOrderDetail AS sod
    WHERE soh.SalesOrderID = sod.SalesOrderID
) sod

The other advantage this has over the correlated sub query is when we want to add more columns in our SELECT list, we do not have to completely repeat the entire query. We still have it in one place, making it somewhat modular. So instead of this:

SELECT
	 SalesOrderID			= soh.SalesOrderID
	,OrderDate				= soh.OrderDate
	,MaxUnitPrice			= (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) -- 1
	,SumLineTotal			= (SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) -- 2
FROM AdventureWorks.Sales.SalesOrderHeader AS soh

We have this:

SELECT
	soh.SalesOrderID
	,soh.OrderDate
	,sod.max_unit_price
	,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
	SELECT
		max_unit_price = MAX(sod.UnitPrice)
		,sum_line_total = SUM(sod.LineTotal)
    FROM Sales.SalesOrderDetail AS sod
    WHERE soh.SalesOrderID = sod.SalesOrderID
) sod

As for the execution plans, in my experience CROSS APPLY has always won. Not always by a lot, but it still wins.

So what is OUTER APPLY? It’s equivalent to a left join on the derived table.

SELECT
	soh.SalesOrderID
	,soh.OrderDate
	,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
LEFT JOIN
(
	SELECT
		max_unit_price = MAX(sod.UnitPrice),
		SalesOrderID
    FROM Sales.SalesOrderDetail AS sod
    GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
SELECT
	soh.SalesOrderID
	,soh.OrderDate
	,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
OUTER APPLY
(
	SELECT
		max_unit_price = MAX(sod.UnitPrice)
    FROM Sales.SalesOrderDetail AS sod
    WHERE soh.SalesOrderID = sod.SalesOrderID
) sod

Indexed Views

To explain what an indexed view is, let’s first look at what constitutes a view. A view may sound like a fancy elaborate thing, however all it is, is a saved SELECT statement, nothing else. It is not explicitly compiled, nor does it contain any data. When you select from a view, it goes to the underlying tables and retrieves the data at the time it is called.

An indexed view on the other hand, is a normal view that takes a copy of the underlying data it points to, and stores it locally. This way, the joins and aggregations that are processed at run-time in a normal view, are already materialized, so when queried, it’s as fast as querying a normal table. Therefore, another name for indexed view is “materialized view”. This is what it’s called in Oracle.

Creating
To make a normal view an indexed view, you need to do two things. First you need to enable SCHEMABINDING for the view. Schemabinding essentially locks the underlying DDL schemas for the tables that the view references. This prevents any DDL changes from being made to the referenced tables. If you want to make a change to the tables, you need to drop the view first.

Let’s create this new view using the AdventureWorks Database:

CREATE VIEW Sales.OrderTotals
WITH SCHEMABINDING
AS

SELECT
	SalesOrderID	= soh.SalesOrderID,
	OrderTotal		= SUM(sod.UnitPrice),
	OrderCount		= COUNT_BIG(*)
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON sod.SalesOrderID = soh.SalesOrderID
GROUP BY
	soh.SalesOrderID

Okay, we’re half way done. Now we need to create a unique clustered index on the view. As you know, a clustered index is essentially a table sorted by it’s indexed keys. So once we create this unique clustered index this materializes all the data. And yes, it does need to be unique:

CREATE UNIQUE CLUSTERED INDEX IDX_C_vw_Sales_OrderTotals_SalesOrderID
ON Sales.OrderTotals
(
	SalesOrderID
)

There, now we have our indexed view completed. If we query the view and look at the execution plan, we should see that the newly created clustered index is referenced.

SELECT *
FROM Sales.OrderTotals

There are times however when the optimizer chooses NOT to use the indexed view. I have run into issues with this even as late as SQL 2005. The workaround for this is to use the query hint NOEXPAND. Also if you do not want to use the indexed view, you can use the other query option listed below:

-- Use the indexed view
SELECT *
FROM Sales.OrderTotals WITH (NOEXPAND)

-- Bypass the indexed view
SELECT *
FROM Sales.OrderTotals
OPTION (EXPAND VIEWS)

View Restrictions
There are also many restrictions in creating an indexed view. I recommend you review these now so you don’t learn the hard way.

The view definition must not contain:

  • ANY, NOT ANY
  • OPENROWSET, OPENQUERY, OPENDATASOURCE
  • arithmetic on imprecise (float, real) values
  • OPENXML
  • COMPUTE, COMPUTE BY
  • ORDER BY
  • CONVERT producing an imprecise result
  • OUTER join
  • COUNT(*)
  • reference to a base table with a disabled clustered index
  • GROUP BY ALL
  • reference to a table or function in a different database
  • Derived table (subquery in FROM list)
  • reference to another view
  • DISTINCT
  • ROWSET function
  • EXISTS, NOT EXISTS
  • self-join
  • expressions on aggregate results (e.g. SUM(x)+SUM(x))
  • STDEV, STDEVP, VAR, VARP, AVG
  • full-text predicates (CONTAINS, FREETEXT, CONTAINSTABLE, FREETEXTTABLE)
  • Subquery
  • imprecise constants (e.g. 2.34e5)
  • SUM on nullable expressions
  • inline or table-valued functions
  • table hints (e.g. NOLOCK)
  • MIN, MAX
  • text, ntext, image, filestream, or XML columns
  • non-deterministic expressions
  • TOP
  • non-unicode collations
  • UNION
  • contradictions SQL Server 2005 can detect that mean the view would be empty (e.g. where 0=1 and …)

Criticism in Designs

Sometimes in writing these articles, I feel like one of the old men (Statler & Waldorf) from the Muppet Show.. The reason is because I am often critical of the use of some features in SQL Server. This would be one of them. Now I do agree that Indexed views are cool and a necessary functionality for SQL Server, however I always try to steer clear of them. Why? In my opinion, they make maintenance difficult. The reason is, the schemabinding option disallows certain DDL modifications on referenced objects. Because of this, if you do want to make a changed to a referenced object, you need to drop the indexed view first. Well, if you need to do this, you also need a maintenance window. And hopefully the view and index are not too big, otherwise recreating could take you a while.

The other reason is the unpredictable need for the NOEXPAND hint. In the cases where I’ve wanted to create an indexed view, I wanted the new indexed view to replace a table that many stored procedures were referencing. So my plans were to create the new indexed view, rename the table to something like table_old, then rename the view to the original table name. Great.. that would work, however in reality every time I’ve done that, I also needed to add the NOEXPAND hint to each query so it would query the indexed view by default and not bypass it and use the underlying tables. For some reason the optimizer would choose not to reference the view. This foiled my plans because the point was to not touch the code in the stored procedures, but instead just rename the view to the table’s name.

So what is the workaround to not having to create an indexed view? Since the biggest use for indexed views I see is aggregations, I prefer to handle my own aggregations using a job and staging tables. Now, this cannot be done in real-time, so if you do need real-time you need to either do it in a trigger (ugh) or through the indexed view. But luckily in all my dealings, the incremental staging process has worked out fine.

Auto Update Statistics & Auto Create Statistics

The ability to create and update statistics is a crucial liberty for the query optimizer. The optimizer needs to know how it is going to query underlying tables, and when the statistics do not match the data, it is very probable that it will choose a non-efficient method for querying.

I’ve seen and heard a lot of debate from DBA’s who think they should turn this off. Of all those DBA’s I think there is one who was correct, and he ended up convincing me of his scenario so I’m not as closed minded as I was before. His server was so bottle-necked during peak time, that he updated the stats off hours. I’m very weary of those who blindly say “it’s better to turn it off” without any factual statistics to back them up. Most of the DBA’s that think they should turn it off are stuck in the SQL Server 7.0 days. At that time, turning this feature off was acceptable because they interfered with the currently running queries.

Example
Case in point, we had a DBA who did turn this feature off in an environment that mirrored production. I overheard the developers complaining how their queries took twice as long on this box. Looking at perfmon, I saw no physical bottlenecks. Since this was the case, I turned to statistics and found both auto update and auto create turned off. After turning it back on, the box was just as fast as production.

Long story short, these options control the one link between the optimizer and the underlying data. If you turn these off, then the very statistics that the optimizer uses to query the data will not be created or updated when needed. You will basically be taking away the optimizer’s reading glasses and turning off the light in it’s room. It will not be able to read anything accurately.

Exceptions
On the other hand, if your server is pegged at 85% CPU, and you’ve done all the optimization’s you can to speed it up, (indexes, stored procedures, memory increases) and you have a plan for updating the statistics in off hours AND you are frequently looking at updating missing indexes when necessary AND your system is mostly read-only (not insert or update intensive).. Then I suppose you can start considering it. But it should be the last leg of consideration in my opinion. And I would not turn off both of them at first. I would probably only turn off the Auto Create Statistics first, and leave the update; unless you have a high number of ad-hoc queries coming in and not much data gets updated.

So, as a summary (and if you didn’t read my long run-on sentence above):

  • Your CPU is constantly pegged over 90% and you have thoroughly exhausted all optimizations
  • Your system weighs heavily on the read-only side (not many inserts / updates)
  • When monitoring traces, you see that statman is taking a lot of resources that pushes you over the bottleneck
  • All queries running against the database are pre-defined (i.e. stored procedures) not many ad-hoc
  • You have a clear maintenance window to update stats – either nightly (preferred) or at least once a week
  • You thoroughly understand the need for statistics and really do not want to turn it off

I hope this helps, good luck with your statistical decisions. :)

Using DBCC UPDATEUSAGE

When DBCC UpdateUsage is run, page and row counts are updated. When this information is incorrect, it can give incorrect reports for how much space is used by tables and indexes. The most likely cause for incorrect usage stats is when a database is upgraded to a new compatibility level of SQL Server (i.e. 2000 to 2005). Therefore it’s important to run this command after the statistics have been updated.

It may not be a bad idea to run on a weekly or monthly schedule if your database undergoes frequent DDL modifications.

To execute you can run one of the following commands:

Use MyDB;
GO
DBCC UPDATEUSAGE(0); -- Execute for the current database
GO
DBCC UPDATEUSAGE(MyDB);  -- Execute using the database name
GO
DBCC UPDATEUSAGE(MyDB);  -- Execute using the database ID

This may take some time, so make sure to run it in a non production environment or during off hours as it may cause blocking.

You can also run for a specific table or index. If you run for a table, then the usage stats are updated for all indexes on that table.

-- Update for a table (and it's indexes)
DBCC UPDATEUSAGE(AdventureWorks, 'Sales.SalesOrderDetail');

-- Update usage for a single index
DBCC UPDATEUSAGE(AdventureWorks, 'Sales.SalesOrderDetail', 'IX_SalesOrderDetail_ProductID');

That’s basically it. Now you may be running these commands and are not seeing any output. Well, that’s because the usage stats are already correct. But if you want to spoof the stats in order to see the output, you can use these undocumented options:

Use Adventureworks;
GO
-- Override existing page & rowcount values
-- Do not do this in prod
UPDATE STATISTICS Sales.SalesOrderDetail WITH ROWCOUNT = 20000, pagecount = 10000;

-- Now when you run the update usage stats, it will show an update
DBCC UPDATEUSAGE(AdventureWorks, 'Sales.SalesOrderDetail');

Using IDENTITY_INSERT

The only way to insert values into a field that is defined as an “IDENTITY” (or autonumber) field, is to set the IDENTITY_INSERT option to “ON” prior to inserting data into the table.

To illustrate, let’s create a table that has an identity column defined.

-- Create table with identity column
CREATE TABLE MyOrders
(
	OrderID int IDENTITY(1,1),
	ProductName varchar(20)
);
-- Now try to insert into the table
INSERT INTO dbo.MyOrders
(
	OrderID,
	ProductName
)
VALUES
(
	1,
	'socks'
);

Executing the above code, we see that we get the following error:
Cannot insert explicit value for identity column in table ‘MyOrders’ when IDENTITY_INSERT is set to OFF.
Cannot insert explicit value for identity column in table 'MyOrders' when IDENTITY_INSERT is set to OFF.

To fix this, we simply need to set the IDENTITY_INSERT to On. Keep in mind, the minimum permissions needed to perform this action is database owner (dbo).


SET IDENTITY_INSERT dbo.MyOrders ON;
INSERT INTO dbo.MyOrders
(
	OrderID,
	ProductName
)
VALUES
(
	1,
	'socks'
);
SET IDENTITY_INSERT dbo.MyOrders OFF;

You want to make sure to turn this off after it’s used. Otherwise, you will not be able to turn this feature on for any other table from within the same session until it is off for this table.

Monitor Queries against a Table

I recently had a need to monitor inserts against a particular table in order to determine what kind of locks they were acquiring. Being that we could not run traces on the system, I had to resort to a roll-your-own approach to monitoring the table. The goal was to determine what kind of locking was occurring, and it would also be nice to be able to associate it to the executing query.

The following code uses DMVs in order to trap a sample of the queries running against it. Please note that this will not trap all queries, however it will work in getting a good number of samples. This script will run in an endless loop so be sure to hit cancel at some point. Also, for some reason the SQL Statement will not always be trapped. It was not that important for me, as I mostly needed the locks, however if someone figures it out, please post.


-- Capture query activity against a table using DMVs
DECLARE @TableName varchar(255);

-- Specify the table you want to monitor
SET @TableName = 'Sales.SalesOrderDetail';

DECLARE @ObjectID int;
SET @ObjectID = (SELECT OBJECT_ID(@TableName));

IF OBJECT_ID('tempdb..##Activity') IS NOT NULL
BEGIN
    DROP TABLE ##Activity;
END;

-- Create table
SELECT TOP 0 *
INTO ##Activity
FROM sys.dm_tran_locks WITH (NOLOCK);

-- Add additional columns
ALTER TABLE ##Activity
ADD SQLStatement VARCHAR(MAX),
SQLText VARCHAR(MAX),
LoginName VARCHAR(200),
HostName VARCHAR(50),
Transaction_Isolation VARCHAR(100),
DateTimeAdded DATETIME;

DECLARE @Rowcount INT = 0;

WHILE 1 = 1
BEGIN

	INSERT INTO ##Activity
	SELECT dtl.*
			,SQLStatement       =
				SUBSTRING
				(
					qt.text,
					er.statement_start_offset/2,
					(CASE WHEN er.statement_end_offset = -1
						THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
						ELSE er.statement_end_offset
						END - er.statement_start_offset)/2
				)
			,qt.text
			,ses.login_name
			,ses.host_name
			,ses.transaction_isolation_level
			,DateTimeAdded = GETDATE()
	FROM sys.dm_tran_locks dtl WITH (NOLOCK)
	LEFT JOIN sys.dm_exec_sessions ses
		ON ses.session_id = dtl.request_session_id
	LEFT JOIN sys.dm_exec_requests er WITH (NOLOCK)
		ON er.session_id = dtl.request_session_id
	OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) AS qt
	WHERE dtl.resource_associated_entity_id = @ObjectID;

	SET @Rowcount = (SELECT @@ROWCOUNT)
	IF @Rowcount > 100
	BEGIN
		BREAK;
	END;
	-- Wait 50 milliseconds
	WAITFOR DELAY '00:00:00.50';

END

SELECT *
FROM ##Activity

Execution Plans

The execution plans SQL Server creates and uses is a huge topic with many intricacies. Now I have never spoken to anyone on the SQL Server Development team, however I would imagine that there are some extremely sharp people in the query optimization team. In a nutshell, this is where the magic happens in SQL Server.

Overview
An execution plan is a behind-the-scenes look at the road a query takes in order to deliver it’s final result. They are generated from the underlying data statistics combined with what the query and it’s parameters are trying to accomplish. When the initial query is read, the execution plan generation engine or “Query Optimizer” searches for the best way to deliver the results of the query in the quickest way possible. To do this, it needs to know what the underlying data looks like. This is facilitated by the statistics that are stored for each table, column, and index. With these statistics in hand, the goal is to eliminate the largest number of records as quickly as possible, and iterate through this process until the final result is delivered. That said, it is not an easy job. There are many variables that come into play when determining a query’s path. A few of these include the selection of indexes, join algorithms, join order, parallelism.

Displaying the Plan
As SQL developers, it is our job to understand how to read execution plans so we can see how the query is being executed. To view an execution plan for a query, select query -> Include Actual Execution Plan. (Or just hold ctrl + M)

Now, the execution plan will be displayed for all queries within this session, until you turn it off. Execute a query and you will see a new tab next to your results that says “Execution Plan”.

Reading
Read the execution plan from right to left. The first paths the query takes are on the right hand side and extend all the way to the left until it gets to the final node. A cost percentage is assigned to each operation.

This gives you a relative understanding of how expensive each operation is for that query. Sometimes this will lead you to find a missing index, modify an existing index, or even break the query up into multiple parts. The cost can sometimes be misleading though. It is not always that the most expensive operation is the slowest or most intensive.

Hovering your mouse over each operation will display additional details about each operation.

These drill downs include valuable information including the index or object being referenced, the columns that are output by the operation, the predicates (or filters), estimated and actual statistics.

Missing Index Hints
Many times, the execution plan will give you hints about what index could be added to speed up the query. You’ll see this at the top of the execution plan in green text.

In order to view the details of the missing index, right click on the green text and choose “missing index details”. This will open a new window with the create statement for the index.

Conclusion
Sometimes, the optimizer will not choose the best execution plan it could for a query. This can be due to a number of reasons, but personally, I try to take those reasons away from the optimizer by breaking my complicated queries up into multiple queries and using temp tables. This helps relieve the optimizer from making bad decisions. With a good amount of practice, you can begin to correlate how the query is written with the way the execution plan is laid out. This does take a lot of trial and error, so don’t be scared to change things like join orders, break things out to temp tables, use exists instead of joins.. This is the best way you will learn what is most efficient.

Understanding Batch Requests / sec

SQL Server’s Batch Requests represents the number of SQL Statements that are being executed per second. In my opinion this is the one metric that gives the best baseline on how your server is performing. It cannot be looked at by itself however. It is necessary to correlate with other metrics (especially CPU usage) in order to get an overall understanding of how your server is performing. The overall goal of any SQL Server is to be able to achieve the most batch requests/sec while utilizing the least amount of resources.

On most busy machines I’ve worked with, this counter averaged around 180 – 400 batch requests/sec during peak time. This peak throughput is heavily dependent on the architectural design of the system, including procedures, tables, and index design. One notable example of this was a busy system I worked with whose procedures were written using loops (ugh).. The average batch requests we could ever achieve was around 200. I was very surprised to initially see this, however digging deeper into the code I became less and less shocked. Ultimately I found one piece of code that affected the throughput of the entire system. It was a scalar UDF defined as a computed column in the busiest table on the system (don’t get me started). Anyway, after rewriting that one process, the system then found batch request peaks that went over 3500! The fact that the system could achieve that number now was a big achievement. It meant the overall throughput of the system was dramatically increased 10 fold and not bottle-necked in one spot.

So pay attention to this counter and realize the goal is to achieve the greatest number of batch requests while keeping the resources low (CPU, Disk, Memory).

To add this counter, open perfmon (performance monitor), click This counter can be found in performance monitor under SQLServer:SQL Statistics: Batch Requests/sec.

Batch Requests/sec Counter