IT박스

좋아하는 성능 조정 요령

itboxs 2020. 7. 6. 08:06
반응형

좋아하는 성능 조정 요령


성능 조정이 필요한 쿼리 또는 저장 프로 시저가있는 경우 가장 먼저 시도 할 사항은 무엇입니까?


다음은 최적화에 대해 묻는 사람에게 항상 유용한 것들을 보여줍니다.
우리는 주로 Sybase를 사용하지만 대부분의 조언은 전반적으로 적용됩니다.

예를 들어 SQL Server에는 다양한 성능 모니터링 / 튜닝 비트가 제공되지만 이와 같은 것이 없으면 (아마도 가능할 경우) 다음을 고려할 것입니다 ...

내가 본 문제의 99 %는 너무 많은 테이블을 조인 에 넣음으로써 발생합니다 . 이에 대한 수정은 일부 테이블과 함께 조인의 절반을 수행하고 결과를 임시 테이블에 캐시하는 것입니다. 그런 다음 해당 임시 테이블에서 나머지 쿼리 조인을 수행하십시오.

쿼리 최적화 점검 목록

  • 기본 테이블에서 UPDATE STATISTICS를 실행하십시오.
    • 많은 시스템이이를 예약 된 주간 작업으로 실행합니다
  • 기본 테이블에서 레코드 삭제 (삭제 된 레코드를 아카이브 할 수 있음)
    • 하루에 한 번 또는 일주일에 한 번 자동으로 수행하십시오.
  • 인덱스 재 구축
  • 테이블 재 구축 (bcp 데이터 출력 / 입력)
  • 데이터베이스 덤프 / 다시로드 (과감하지만 손상을 해결할 수 있음)
  • 새롭고 더 적절한 색인 작성
  • DBCC를 실행하여 데이터베이스에 손상이 있는지 확인하십시오.
  • 자물쇠 / 교착 상태
    • 데이터베이스에서 실행중인 다른 프로세스가 없는지 확인
      • 특히 DBCC
    • 행 또는 페이지 수준 잠금을 사용하고 있습니까?
    • 쿼리를 시작하기 전에 테이블을 독점적으로 잠그십시오.
    • 모든 프로세스가 동일한 순서로 테이블에 액세스하고 있는지 확인
  • 지수가 적절하게 사용되고 있습니까?
    • 두 표현식이 정확히 동일한 데이터 유형 인 경우 조인은 인덱스 만 사용합니다.
    • 인덱스의 첫 번째 필드가 쿼리에서 일치하는 경우에만 인덱스가 사용됩니다.
    • 적절한 곳에 군집 지수가 사용됩니까?
      • 범위 데이터
      • value1과 value2 사이의 WHERE 필드
  • 작은 조인은 좋은 조인입니다
    • 기본적으로 옵티마이 저는 한 번에 테이블 4 만 고려합니다.
    • 이는 4 개 이상의 테이블과 조인 할 때 최적이 아닌 쿼리 계획을 선택할 가능성이 있음을 의미합니다.
  • 조인 해체
    • 조인을 끊을 수 있습니까?
    • 외래 키를 임시 테이블로 미리 선택
    • 조인의 절반을 수행하고 결과를 임시 테이블에 넣습니다.
  • 올바른 종류의 임시 테이블을 사용하고 있습니까?
    • #temp테이블은 @table많은 양 (수천 행)의 변수 보다 훨씬 성능이 좋을 수 있습니다 .
  • 요약 테이블 유지
    • 기본 테이블에서 트리거로 빌드
    • 매일 / 시간별 / 등을 만듭니다.
    • 임시 빌드
    • 점진적으로 구축 또는 분해 / 재 구축
  • SET SHOWPLAN ON을 사용한 쿼리 계획 확인
  • SET STATS IO ON으로 실제로 어떤 일이 일어나고 있는지 확인
  • pragma를 사용하여 색인을 강제 실행하십시오. (index : myindex)
  • SET FORCEPLAN ON을 사용하여 테이블 순서 강제
  • 매개 변수 스니핑 :
    • 저장 프로 시저를 2로 나누기
    • proc1에서 proc2를 호출
    • proc1이 @parameter를 변경 한 경우 옵티마이 저가 proc2에서 인덱스를 선택할 수 있도록합니다.
  • 하드웨어를 향상시킬 수 있습니까?
  • 몇시에 뛰니? 조용한 시간이 있습니까?
  • Replication Server (또는 기타 논스톱 프로세스)가 실행 중입니까? 일시 중지 할 수 있습니까? 예를 들어 실행하십시오. 매시간?

  1. 머리 속에서 쿼리를 실행하는 최적의 경로를 잘 알고 있어야합니다.
  2. 항상 쿼리 계획을 확인하십시오.
  3. STATS를 켜서 IO 및 CPU 성능을 모두 검사 할 수 있습니다. 쿼리 시간이 아닌 다른 숫자 나 숫자를 줄이는 데 집중하십시오 (다른 활동, 캐시 등의 영향을받을 수 있음).
  4. 연산자에 많은 수의 행이 있지만 작은 숫자가 나오는지 찾으십시오. 일반적으로 인덱스는 들어오는 행 수를 제한하여 디스크 읽기를 절약하는 데 도움이됩니다.
  5. 가장 큰 비용 하위 트리에 먼저 중점을 둡니다. 하위 트리를 변경하면 종종 전체 쿼리 계획이 변경 될 수 있습니다.
  6. 내가 본 일반적인 문제는 다음과 같습니다.
    • 조인이 많으면 Sql Server가 조인을 확장 한 다음 WHERE 절을 적용하도록 선택하는 경우가 있습니다. 일반적으로 WHERE 조건을 JOIN 절로 이동하거나 조건이 인라인 된 파생 테이블로 이동하여이 문제를 해결할 수 있습니다. 뷰가 동일한 문제를 일으킬 수 있습니다.
    • Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
    • Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
    • Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.

Slightly off topic but if you have control over these issues...
High level and High Impact.

  • For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
  • Don't use drives less than 1500K.
  • Make sure your disks are only used for your Database. IE no logging no OS.
  • Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
  • design your schema and indexes for the type queries.
  • if it's a log type table (insert only) and must be in the DB don't index it.
  • if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
  • Don't be afraid of replicating data in exchange for performance!

CREATE INDEX

Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.

If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.

In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)

Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:

SELECT   i.make, i.model, i.price
FROM     dbo.inventory i
WHERE    i.color = 'red'
  AND    i.price BETWEEN 15000 AND 18000

Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.

Of these index choices, idx01 provides the faster path to satisfy the query:

CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)

This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.

I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.


A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.

UPDATE table
SET @variable = column = @variable + otherColumn

Or the more readable version:

UPDATE table
SET
    @variable = @variable + otherColumn,
    column = @variable

I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.

Here's details and example code that made fantastic improvements in performance: http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx


Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.


@Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).

Coalesce vs. IsNull


Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.


Look at the where clause - verify use of indexes / verify nothing silly is being done

where SomeComplicatedFunctionOf(table.Column) = @param --silly

I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.


On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).

declare @temp table(
    RowID int not null identity(1,1) primary key,
    SomeUniqueColumn varchar(25) not null,
    SomeNotUniqueColumn varchar(50) null,
    unique(SomeUniqueColumn)
)

I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.

In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.

After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.

Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.

Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.


First step: Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!

SET STATISTICS IO ON
SET STATISTICS TIME ON


Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.


Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:

SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
    SELECT Table1ID
    FROM Table2)

But you get much better performance with this:

SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null

@DavidM

Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...

In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.


Index the table(s) by the clm(s) you filter by


Not necessarily a SQL performance trick per se but definately related:

A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).


Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.


I look out for:

  • Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
  • Look out for any application code that:
    • Calls an SP that returns a large set of records,
    • Then in the application, goes through each record and calls an SP with parameters to update records.
    • Convert this into a SP that does all the work in one transaction.
  • Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
  • Any SP's that re-invent the wheel.
  • Any SP's that I can't understand what it's trying to do within a minute!

SET NOCOUNT ON

Usually the first line inside my stored procedures, unless I actually need to use @@ROWCOUNT.


In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.

SELECT * FROM Orders (nolock) where UserName = 'momma'

Remove cursors wherever the are not neceesary.


Remove function calls in Sprocs where a lot of rows will call the function.

My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.

Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.


  • Prefix all tables with dbo. to prevent recompilations.
  • View query plans and hunt for table/index scans.
  • In 2005, scour the management views for missing indexes.

I like to use

isnull(SomeColThatMayBeNull, '')

Over

coalesce(SomeColThatMayBeNull, '')

When I don't need the multiple argument support that coalesce gives you.

http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx


Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.


Dirty reads -

set transaction isolation level read uncommitted

Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)


I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.

참고URL : https://stackoverflow.com/questions/18783/favourite-performance-tuning-tricks

반응형