<Insert Picture Here>
Designing and Tuning Large High Performance
systems: Tips and Tricks
Graham Wood
Architect, Server Technologies
1
Designing and Tuning: Tips and Tricks
NOT
Designing for Oracle
• Changes over time
• CPU
• IO
• Networks
• BUT Designing and Tuning, Large High Performance
systems is NOT about tips and tricks, or fashion
• It is about good engineering practice
2
Scaling
Some Possible Definitions
Continuous Innovation
Oracle 11g Exadata Storage
Real Application Testing
Oracle 8i Advanced Compression
Automatic Storage Management
Oracle 8 Transparent Data Encryption
Self Managing Database
XML Database
Oracle 7 Oracle Data Guard
Real Application Clusters
Flashback Query
Oracle 6 Virtual Private Database
Built in Java VM Oracle 10g
Partitioning
Partitioning Support
Oracle 5 Built in
in Messaging
Object Relational Support
Multimedia Support Oracle 9i
Oracle 2 Data Warehousing Optimizations
Parallel Operations
Distributed SQL & Transaction Support
Cluster and MPP Support
Multi-
Multi-version Read Consistency
Client/Server Support
Platform Portability
Commercial SQL Implementation
3
Database Performance Basics
Schemas and SQL Statements
• Good schema/data design is the foundation to good
database performance.
• Good schema/data design makes downstream
activities both simpler and less error prone
• Selection of Index and Partitioning Columns
• Gathering and maintenance of schema statistics
• Writing correct SQL statements
• Optimization of SQL statements
• Avoidance of serialization points and transactional
bottlenecks
Scaling
What is it?
n Linear
o Unlimited
p Maximum
q Vertical
r Horizontal
s Unprecedented
t Ultimate
u Predictable
v Diagonal
4
Scalability Quiz
Label the axes ?
80
70
60
50
40
30
20
10
0
1 2 3 4
Scaling
Some Possible Definitions
• When adding hardware resources to a system:
• Number of transactions processed increases proportionately
• Query times reduce proportionately
• Query times remain constant when data volumes increase
proportionately
• Hardware resources include
• Nodes
• CPU
• Memory
• Network
• I/O
5
What Prevents Scaling
With respect to CPUs
• Scaling is severely impacted when the following
scenarios arise
• Hardware resources are operating beyond their practical
maximum capacity for work
• Software serialization takes place
• Application e.g. Row lock contention
• Database e.g. Latch contention
• Operating System e.g. Process allocation
Scaling the Oracle Database
In theory
• Requirements when scaling the Oracle Database
• Effective hardware capacity planning
• A balanced design
• Hardware components sized to absorb peak workload
• Effective application architecture
• All software components tested and validated
• Safe working limits identified
6
Scaling the Database
In reality
• Common Application Scaling Issues
• Single Instance Scaling Issues
• Multiple Instance Scaling Issues
Common Application Issues
• Incorrect use of Sessions
• High connect/disconnect rates to the database
• High load on O/S for process creation/teardown
• High load mapping/unmapping SGA
• CPU load on establishing database state
• Incorrect use of Cursors
• Reparsing is best avoided
• Hard parses use lots of CPU and will serialize
• Soft parses scale better but they are not free
7
Database Performance Basics
Sessions & Cursors(This slide is over 10 Years old!)
400
350
300
250
200
Users
150
100
50
0
Logon Hard Soft Parsing Optimal
Logoff Parsing
EXACT
Common Application Issues
• Poor SQL
• Poor Access Paths
• Excessive resource usage (CPU, I/O)
• Poor connection management
• High numbers of connections can cause problems
• Use connection pooling/concentration in the middle-tier
• Use of Shared Servers is often just a band-aid
8
Common Configuration Issues
• Non-default initialization parameters
• I/O problems
• Poor use of available resources resulting in hot spots
• Bandwidth limitations
• Poor response times
• Use ASM to eliminate unpredictable hot spots
• Space management
• Use Automatic Undo Management
• Use Auto Segment Storage Management
Tuning Oracle
• Changes over time
• Hardware
• Software
• Functionality
• BUT Designing and Tuning, Large High Performance
systems is NOT about tips and tricks, or fashion
• It is about good engineering practice
• Correct diagnosis is key to successful tuning
9
Oracle Tuning Methods: A History
• Prehistoric (v5)
• Debug code
• Dark Ages (v6)
• Counters/Ratios
• BSTAT/ESTAT
• SQL*Trace
• Renaissance (v7/v8)
• Introduction of Wait Event instrumentation
• Move from counters to timers
• STATSPACK
• Modernity (v10)
• DB Time Tuning – Tuning using fundamental notion of time spent in database
• Multiple scoping levels
• Always on, non-intrusive
• Built into infrastructure: instrumentation, ASH, AWR, ADDM, EM
Why Do We Care About Time?
• Human time is critical to the enterprise
• Systems performance affects business goals
• Human time + technology resource time
• “Time is money”
• Performance improvement means doing things faster
Performance is always and only about time
10
<Insert Picture Here>
Database Time and
Average Active Sessions
Database Time (DB Time)
• Total time in database calls by foreground sessions
• Includes CPU time, IO time and non-idle wait time
• DB Time <> response time
• New lingua franca for Oracle performance analysis
Database time is total time spent by user
processes either actively working or actively
waiting in a database call.
11
A Single Session
Single session with Database Black Box server
Browse Read Reviews Add to Checkout
Books For One Book Cart
= time spent in database TIME
Fundamental Concepts
Database Time (DB Time) =
Total time session spent in all database calls
Active Session =
Session currently spending time in a database call
Average Activity of the Session (% Activity) =
The ratio of time active to total wall clock time
Browse Read Reviews Add to Checkout
Books For One Book Cart
= time spent in database TIME
12
Multiple Sessions
DB Time = Sum of DB Time Over All Sessions
Avg. Active Sessions = Sum of Avg. Activity Over All Sessions
At time t we have 2 active sessions
User 1
User 2
User 3
User n
TIME
t
= time spent in database
13
Visualizing DB Time
Total Database Time
Avg. Active Sessions =
Wall Clock (Elapsed) Time
User 1
User 2
User 3
User n
t0 t1
4
3
2
1
Active Sessions over time TIME
EM Performance Page
• Active Sessions by wait class over time
• Colored area = amount of DB time
• “Click on the big stuff”
14
Where to find DB Time?
• V$SYS_TIME_MODEL, V$SESS_TIME_MODEL
• STAT_NAME = ‘DB time’
• V$SYSMETRIC_HISTORY
• “Database Time Per Second”, “CPU Usage Per Sec”
• 10g units = centi-secs/sec (100xAvg. Active Sessions)
• 11g new metric “Average Active Sessions”
• V$SQL
• ELAPSED_TIME and CPU_TIME
• Wait class times:
APPLICATION, CONCURRENCY, CLUSTER, USER_IO
• V$ACTIVE_SESSION_HISTORY
<Insert Picture Here>
Active Session History
15
Active Session History (ASH)
• All ‘Active’ sessions captured every second
• Foregrounds and backgrounds are sampled
• Active foregrounds contribute to DB Time
• In-memory: V$ACTIVE_SESSION_HISTORY
• Sampling interval = 1 second
• On-disk: DBA_HIST_ACTIVE_SESS_HISTORY
• Sampling interval = 10 second
• ASH is a system-wide record of database activity
Active Sessions and DB Time
ASH sample count is value
Active sessions of active sessions function
at sample times
DB time is area
under curve ∆t = 1 sec
DB Time
t0 time t1
16
Estimating DB Time with ASH
• ASH sample counts = DB Time in seconds
• Low sample counts are less reliable
• Enables DB Time analysis over many dimensions
• Sqlid, session id, instance, service, module, action
• 10gR2
• Blocking_sid (10gR2)
• XID
• 11g
• Row source
• Execution ID
• Operation type
• Connect
• Java/SQL/PLSQL
• parse, bind, execute/fetch, close
DB Time: ASH vs Time Model
17
Where is DB Time used?
• ADDM
• EM Performance page and drill downs
• ASH report
• AWR and AWR compare periods reports
• SYSMETRICS and Server-generated Alerts
<Insert Picture Here>
Techniques:
The DB Time Method
18
The DB Time Method: Short Course
Determine where
or database
time isjust
spent,
askand reduce it!
ADDM
The DB Time Method: Process
1. Identify performance issue
2. Scope the issue
3. Set goals
4. Data capture (NO OP)
5. Investigate DB time distribution
• Identify the largest potential for improvement
6. Modify system to tune for largest gain
7. Evaluate against goals
• Repeat from step 4 if goals not met
Performance tuning by removing excess DB time
19
Investigate DB Time Distribution
• Identify uneven distributions of DB time (skew)
• => Largest potential improvement within scope
• System scope:
• Resource limits – is problem outside the DB?
• Application scope:
• Service, module, action
• Resource contention (e.g. latches)
• SQLID, rowsource
• Session scope:
• Long running SQL
• Resource contention (e.g. enqueues)
DB time
20
Identify Potential Solutions
• Session contention issues
• Kill session
• Fix application
• SQL issues
• SQL Tuning Advisor => Indexes, SQL profile
• Re-write SQL
• Design issues
• Access Advisor => Indexes, physical layout
• System issues
• Initialization parameters
• Add resources
Modify System
• Start with the largest DB time issues first
• Address root causes, not symptoms
• Match solution scope to problem scope
• Don’t tweak optimizer parameters before tuning SQL
• Proceed iteratively one fix at a time
• Concurrent fixes should be orthogonal
• Measure and validate results at each successive step
• Stop when goals are met
21
The DB Time Method: Advantages
• Tunes the one thing that affects users: Time
• Data capture scoping not necessary
• ‘Always on’ data collection
• No requirement to reproduce problem
• Works for concurrency problems such as locking
• Combines best of current methods
• Less intrusive, more inclusive
Method Summary
• DB time is the fundamental performance metric
• The method allows DB time analysis at many scopes
• Proper scoping of problems and solutions is critical to success
• DB time based diagnosis removes value judgments
• Scientific method, not sorcerer’s magic
• Performance improvement means doing the same work in
less DB Time
22
<Insert Picture Here>
Tools:
ADDM
Enterprise Manager
Reports
Tools for Applying DB Time Method
Two use-cases, one method:
1. Tuning steady-state performance
• Improve overall workload throughput or response time
• Best practice: use ADDM
2. Diagnosing transient performance problems
• Confirm and investigate reported performance issues
• Best practice: use EM real-time screens
23
Best Practice: Use ADDM
• Embedded expert system using the DB time method
• Identifies root causes behind the symptoms
• Variably scoped:
• Host to instance to SQL and even database block
• Scoped to database for RAC (new in 11g)
• Findings prioritized by impact on DB time
• Finding history allows flexible time scoping
• Directives can filter findings
• Recommendations by benefit (reduction) to DB time
24
25
Best Practice: EM Real-time Interface
• Transient (sub-hour) or immediate time scope
• Requires interactivity of UI
• ‘Click on the big stuff’
• Data visualizations display skew directly
• Takes some expertise to separate symptoms from
root causes
26
27
Conclusions
• Design of Large, High Performance Systems is
engineering, not a mystical art
• Good database design leads to
• Simpler implementation
• Scalability
• Oracle database instrumentation
• Simpler performance diagnosis
• Increases probability of success
28
Time for Your Questions
29