MS SQL

SQL Server Query Execution Plan

darkturtle26 2019. 2. 10. 09:18


아래 링크 리뷰 및 테스트 후, 발췌함

<< 링크 >> 
-. SQL Server query execution plans – Basics
-. SQL Server Execution Plans overview
-. SQL Server query execution plans – Viewing the plans
-. SQL Server query execution plans – Understanding and reading the plans
-. SQL Server query execution plans – Examples with the SELECT statement
-. SQL Server query execution plans – Examples with the WHERE clause




<< SQL Server query execution plans – Basics >>
Why is query execution important for SQL Server performance ? 
SQL Server performance monitoring depends on the performance goals.
The first setp in monitoring is to determine optimal performance fro your server and create performance trends based on the captured metric information
If performance is not satisfactory, It's necesaary to diagnose performance problems and find their orgin.

you'd like the results to appear quickly. If not , is there anything that can be done to make ther query run faster ?

The SQL Server query execution plan can answer this question and help you diagnose the problem, if it exists. 
The execution plan can also help you write efficient queries, create the right indexes that quickly provide only the rows you need,
Instead of searching through millions of records, etc.

What is a SQL Server rquery execution plan ?
A query plan, execution plan, or query execution plan is an algorithm showing a set of steps in a specific order that is executed to access data in a database.
A query plan show how a query was executed, or how it will be executed which is significant for troubleshooting query performance issues.
When a new query is executed, Query Optimizer evaluates the query plan, optimizes and compiles it, and stores it in the plan cache.
The plan cache is a part of SQL Server buffer where data and query plans are stored ( buffered ), so they can be reused later

A very useful characteristic of query plans isthat when a stored procedure is executed,
the query plan is created for the store procedure name and same query plan will be reusd whenever the stored procedure is executed,
despite the value specified for procedure parameters.
When it comes to executing ad hoc queries, query plans are created based on complete code, so different parameters or any change in code will prevent reuse of the existing plan.
This clearly indicates what should be done to make your code run faster - wrap it up as stored procedures or functions, and the existing query plans will be reused
and threfore code will be executed much faster.

Optimizer forad hoc workloads ( 2008~ ) : Instead of buffering the whold plan, when the option is set to "True", only a fragment of the plan is buffered when the query is executred for the firsttime.
When an ad hoc query is executed for the second time, its complete plan is buffered in the cache.

-- Advanced Option 활성화
sp_configure 'show advanced options',1
reconfigure
go
-- 설정 확인
sp_configure 'optimize for ad hoc workloads'
go
-- 설정 변경
sp_configure 'optimize for ad hoc workloads',1
go
reconfigure
go
-- 설정 확인
sp_configure 'optimize for ad hoc workloads'
go

Howerver, keep in mind that this is an advanced option, therefore recommended to be modified only by advacned users

How to remover query plans in SQL Server ?

To remove all query plans from cache manually, use the folloowing statement

DBCC FREEPROCCACHE WITH NO_INFOMSGS

Using this statement is recommended for advanced users only and it should be never be used on a production server 
before previous analysis of the instance state, as it will cause recompilation of stored procedures and thus slowdown the system performance.

How to recompile query plans in SQL Server ? 
Query plans are automatically recompiled whenever a cache plan becomes invalid.
A plan can become invalid ( or obsolte ) if there is a database change.
The most common causes are changes the objects used by the stored procedure: the stored procedure itself, the referenced table or view, indexes by the plan, statistics,
significant key data changes, etc.
Next time the same stored is executed, Query Optimizer will find an invalid plan in the cache, and will recompile a new one.
The good news is that starting with SQL Server 2005, only the statements that need recompilation are recompiled, not the whold batch

A SQL Server query plan can also be recompiled explicitly when the stored procedure is executed using the WITH RECOMPILE clause, or the sp_recompiled store procedure is executed.

As show, a query plan reuse makes the time needed for execution shorter, therefore it's recommended to save the frquently used quries as store procedures.






<<SQL Server Execution Plans overview>>

When you submit a T-SQL query, you tell the SQL Server Engine what you want, but without specifiying how to do it for you.
the SQL Server Engine will perform four internal query processing operations
If the submitted query is not a Data Manipulation Language ( DML ) statement, such as CREATE TABLE or ALTER TABLE, there will be no need to optimize that query,
as there is only one straight way for the SQL Server Engine to perform that action.
The four processing steps that are performed by both the SQL Server Relational Engine and SQL Server Storage Engine are summarized as blow :



Having a large unumber of ad hoc queries running on the databas will prevent the execution plans from being reused and require continuous plans generations,
that should be considered when configure the workload typesetting.
In sepcific situations, the SQL Server Query Optimizer prefers to create a basic plan, also known as a trivial plan, to execute the queris that have no aggregations or complex calculations,
rather than spending time and consuming resources in generateing the most efficient plan for that query



The SQL Server Database Administrators depend frequently on Execution Plans in troubleshooting the performance of the T-SQL queries and locating the worst performing parts.
The Execution Plan alsow provides DBAs with answer for different questions, such as why the query is slow and taking a long time, consuming large amount of CPU, Memory or I/O resources,
or whyt this index is not used in the query. 
In addition, the Execution Plan helps the DBAs to write queries in a more efficient way and choose the most suitable and highest recommended index to speeds up data retrival.
To be able to use the execution plan in tuning the performance of the queries, a DBA should have the required skill to create the execution plans, understand its different types
and analyze its components and operartors.


<< SQL Server query execution plans – Viewing the plans >>

An estimate execution plan is a SQL Server query plan that is generated without actually running the query ( or stored procedure) the plan is created for.
It's based on estimation of expected behavior.

An actual execution plan is the SQL Server query plan that is generated after a query was executed. 
It's more reliable, as it's based on the actual exection, not estimates. 
It also provides more information and statistics, therefore it's much useful in troubleshooting

Free product : ApexSQL  >> https://www.apexsql.com/sql-tools-plan.aspx

Use the query Cache 
One of the options to see query plans is to query the content of the plan cache using Dynamic Management Views ( DMVs )

The sys.dm_exec_cache_plans view show one row for every query plan stored in the plan cache.
The view shows query text, memory used, and how many times the plan was reused.
The sys.dm_exec_sql_text view show the SQL Batch text, identified by sql_handle

SELECT qp.query_plan,
       CP.usecounts,
       cp.cacheobjtype,
       cp.size_in_bytes,
       cp.usecounts,
       SQLText.text
  FROM sys.dm_exec_cached_plans AS CP
  CROSS APPLY sys.dm_exec_sql_text( plan_handle)AS SQLText
  CROSS APPLY sys.dm_exec_query_plan( plan_handle)AS QP
  WHERE objtype = 'Adhoc' and cp.cacheobjtype = 'Compiled Plan'

SET STATISTICS PROFILE ON : execute the query, show the results and text of the actual query plan

Use SQL Server Profiler 
A query execution plan can also be captured in a SQL Server trace and opened in SQL Server profiler
  1. Start SQL Server profiler
  2. In the File Menu, select New trace
  3. In the Events Section tab, check Show all events
  4. Expand the Performance node
  5. Select Showplan XML
  6. Execute the query you want to see the query plan for
  7. Stop the trace.
  8. Select the query plan is the grid 
Profile을 통해 수집은 했는데, Plan 보는것 까지는 어떻게 하는지 모르겠음 ㅇㅇ;

This method is not recommended due to serveral downsides,
SQL Server Profiler adds some overhead that affects query performance.
Another reason is that filtering the events and finding the specific one among thousands of records in not easy in SQL Server profiler.


<< SQL Server query execution plans – Understanding and reading the plans >>

General guideline for reading a SQL Server query execution plan
-. Each query in a batch is shown in the plan
-. Each icon presents an operator
-. Icons are read from right to left, top to bottom
-. The cost of each query is presented as a percentage of the total batch cost.
   The cost is the time needed to execute a statement / query / batch .
-. The arrow width depends on the actual/estimate number of rows. The arrows that represent a large number of row are thicker

Operators show how queries and statements are executed.
They can be physical and logical.
Logical operators present an operation on a conceptual level,
while physical operators implement the operation defined by the logical operator using a specific method.
If there is any kind of warning, it will be indicated by the yellow triangle on the icon.
The warning text is shown in the operator tooltip

The conditional statements, such as IF ... THEN ... ELSE are presented with 3 children.
WHILE and DO-UNTIL statements are represented similarly

IF
(SELECT COUNT(*) FROM [Person].[Address] WHERE City LIKE 'Bothell' ) > 0
        (SELECT COUNT(*)
         FROM [Person].[Address]
         WHERE City LIKE 'Bothell%')
Else
        (SELECT COUNT(*)  
         FROM [Person].[Address]
         WHERE AddressLine2 is null)





T-SQL statements and stored procedure are presented as tree roots.
Statements called by the stored procedure are presented as children in the tree.

select * from
[dbo].[ufnGetContactInformation]
(
   1
)























Data manipulation language (DML) statements SELECT, INSERT,DELETE, and UPDATE are also presented as tree roots.
The firsta child represents the execution plan for the statements.
If the statement fires a trigger, its' represented as the second child.

The DECLARE CURSOR statement is shown as the tree root. The statment it refers to is shown as a child 

DECLARE vend_cursor CURSOR FOR SELECT * FROM Purchasing.Vendor
OPEN vend_cursor
FETCH NEXT FROM vend_cursor ;



Operator tooltips 


Estimate operator cost - the cost of the operation, presented as a percentage of total batch cost 
Estimate I/O cost, Estimate CPU cost - Query analzyer creates serveral execution plans, calculates the cost of each ( how much I/O and processor resources are needed 
             to complete the operation), and determines the most efficient one 
Estimate row size - the number of rows that is believed to be affected by the operation, useful for determining how much memory is needed.
Actual and Estimated number of rows - even if there is significant descrepancy between these two values, it doesn't indicate a problem. It's usually caused by out of data statstics
Actual and Estimated execution mode - row or batch, show whethers the rows are processed one at a time or in batches.



<< SQL Server query execution plans – Examples with the SELECT statement >>

Clustered and nonclustered indexes
A nonclustered index contains a pointer to the data row that contains the key value.
A clustered index stores and sorts the data based on the key values.
Searching for a specific value in a clustered table ( a table with a clustered key) is easy like searching for a name in an alphabetically ordered address book.
When there is no clustered index on the table, the data is unordered and searching for a specific value requires more time and resources.

SELECT * on a table with a clustered index 

CREATE TABLE [Person].[Address](
    [AddressID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [AddressLine1] [nvarchar](60) NOT NULL,
    [AddressLine2] [nvarchar](60) NULL,
    [City] [nvarchar](30) NOT NULL,
    [StateProvinceID] [int] NOT NULL,
    [PostalCode] [nvarchar](15) NOT NULL,
    [SpatialLocation] [geography] NULL,
    [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED
(
        [AddressID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
 
CREATE NONCLUSTERED INDEX [IX_Address_StateProvinceID] ON [Person].[Address]
(
    [StateProvinceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


SELECT * FROM Person.Address




Table and Index scans should be avoided as they are resource expensive and require more time.
Having a seek operator in the query execution plan is preferred over scan.
As the scan operator read every row, its cost is proportional to the number of  table rows.
The seek operator does not read all records, just the specific ones that correspond to the query condition
Whenever possible, use a list of columns in the SELECT statement and a WHERE clause to narro down the rows and columns returned by the query



Although the estimate cost values have no units, it's the estimated time in seconds, needed to execute the analyzed query.
The cost is calculated by Query Optimizer.

SELECT <column list> on a table with a clustered index 











>>> Plan operation 이 조금 달라서, SSMA 로 추가 확인 

SELECT * on table with on clustered indexes.

-- 테이블 복사
SELECT *
INTO   Person.Address1
FROM   Person.Address ;
-- 인덱스 별도 생성
CREATE UNIQUE NONCLUSTERED INDEX [IX_Address_AddressLine1_AddressLine2_City_StateProvinceID_PostalCode] ON [Person].[Address1]
(
       [AddressLine1] ASC,
       [AddressLine2] ASC,
       [City] ASC,
       [StateProvinceID] ASC,
       [PostalCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF
, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO




When all rows from a table are returned ( a SELECT statement without a WHERE clause ) and no clustered index is created on the table, the engine has to go through the whole table, scanning row after row


<<SQL Server query execution plans – Examples with the WHERE clause>>

SELECT with WHERE on a clustered index 








the database engine scans a particular range of rows from the clustered index.
It seeksfor the clustered index, not scan for it, therefore we can expect lower query cost 

SELECT with WHERE on a nonclustered index 














APEXSQL 에서 결과가 조금 달라서, SSMS 에서 다시 수행 



This is not an error, or warning, you should consider this message as an advice what you can do to improve the query execution performance by almost 90%.
To see the recommended index, right-click the plan and select the Missing Index details option.

However, it's highly recommended not to create all indexes that are reported as missing by Query Optimizer in the query exection plans.
Keep in mind that it's only a recommendation for the specific plan and that it doesn't mean that the whole system and workload would benefit from the index.
As indexing has benefits as well as downsides, It's necessary to determine whether this index will really improve overall performance.
There are serveral factores you should consider when determining necessity of the index.
The first one is how often the query is executed.

A Key Lookup indicates that obtaining all records is done in two steps.
To return all records in a single step, all needed columns should be returned by the index scan 
The simplest solution is to specify the list of columns returned by the index scan 



Whenever there is a key Lookup operator, it's followed by the Nested Loops which is actually a JOIN operator which combines the data returned by the Index scan and key Lookup operators.
As show in the tooltip, the index seek is performed only once ( Number of executions = 1 ) and it returns 16 rows ( Actual number of rows )
The Key Lookup operator uses the 16 rows returned by the index seek and looks up for the records in the table.
A lookup is performed for every row returned by the index seek, in this example 16 times, which is presented by the Key Lookup Number of executions.

For a small number of rows returned by Index Seek, this means a small number of executjions by the key Lookup.
When the Key Lookup output is indexed(= Covered Index (?)), the time needed to perform a lookup is shorter, so this is a scenario with as small cost.