How To Diagnose Random
Performance Incidents
Using ASH
Craig Shallahamer
craig@orapub.com
1 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
“My screen froze
this morning for
about 30 seconds!”
3 (c)OraPub, Inc ASH - Incident
About Me...
• Long time Oracle DBA
• Specialize in Oracle Database
performance and predictive analysis
• Performance researcher
• Blogger: A Wider View About Oracle
Performance Tuning
• Author: Oracle Performance Firefighting
and Forecasting Oracle Performance.
• Conference speaker
• Teacher and mentor
• Oracle ACE Director
• IOUG DBA Track Manager
4 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
OraPub
works
closely with
DBAs who
want to take
their Oracle
tuning skills
to the next
level.
So, you can
solve the
toughest
Oracle DB
problems.
5 (c)OraPub, Inc ASH - Incident
How connect
with Craig and OraPub
Community with a common goal
craig@orapub.com
@OraPubInc
OraPub.Com: Everything starts here!
Connect and network
6 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Incident Analysis
The Challenge
7 (c)OraPub, Inc ASH - Incident
AWR report says
physical read IO is
the heart of the
issue.
Users say their
screen randomly
locks for a minute.
8 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Solution Process
@ Be aware when the instance profile doesn’t match
a user(s) experience... the needle...
@ The incident can also be a BIG needle.
@ Use clues to direct us into the incident:
– date and time range
– user experience; random, functional, repeatable
– Think: merge user experience, wait events, OS and
application situation
– logs of all kinds, especially the Alert Log file.
@ Increasingly converge towards the incident core
@ Timeline linear situations
@ Chart complex relationships
9 (c)OraPub, Inc ASH - Incident
> orapub.com
>resources
> tools
10 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Demo
ASH-TX-Demo-Craig-Steps.txt
13 (c)OraPub, Inc ASH - Incident
The Situation
Some users said when they got back
from lunch around 2pm the EBS
batch processing queue was stuck
and building up.
15 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
The Situation
The database time is set 9 hours ahead of user location
because of the data center location. So we are looking
for incident time at around 11pm.
One of the DBAs said that the Oracle alert log had a
deadlock message posted at 11:12pm.
While not widely discussed, the truth is the DBAs have
seen row locks in the batch system throughout the day
and they have been killing the sessions.
But this time, something went wrong and batch
processing effectively froze.
16 (c)OraPub, Inc ASH - Incident
Oh Oh... Never a good thing...
...
Sun May 15 23:09:48 2016
LNS: Standby redo logfile selected for thread 1 sequence 20101 for destination LOG
Sun May 15 23:09:53 2016
Archived Log entry 130818 added for thread 1 sequence 20100 ID 0xd839d567 dest 1:
Sun May 15 23:09:54 2016
ARC5: Standby redo logfile selected for thread 1 sequence 20100 for destination LO
Sun May 15 23:10:00 2016
Completed checkpoint up to RBA [0x4e85.2.10], SCN: 6479478976626
Sun May 15 23:12:20 2016
Global Enqueue Services Deadlock detected. More info in file
/u01/app/orapi/admin/diag/rdbms/prodi/PRODI1/trace/PRODI1_ora_3836.trc.
Sun May 15 23:12:20 2016
Dumping diagnostic data in directory=[cdmp_50160321131223], requested by (instance
Sun May 15 23:15:45 2016
Auto-tuning: Shutting down background process GTX2
Sun May 15 23:19:56 2016
Auto-tuning: Starting background process GTX2
Starting background process GTX2
...
Oracle’s Alert Log
17 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Let The
Typing
Begin
ASH-TX-Demo-Craig-Steps.txt
18 (c)OraPub, Inc ASH - Incident
First, set the analysis window
1 of 2
19 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Looking at summary session activity.
20 (c)OraPub, Inc ASH - Incident
Looking detailed session activity.
21 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Observations
@ No table level locks, only row
@ Lock duration not crazy long
@ Sessions are not blocked by the same
session
@ The same SQL IDs are involved
@ Dead lock is clearly seen three times!
22 (c)OraPub, Inc ASH - Incident
Detailed visualization
23 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
24 (c)OraPub, Inc ASH - Incident
25 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Read like this:
Session 1747 is
running SQL a2z
and waiting on a
TX lock held by
session 479.
Session 479
running SQL same
SQL a2z and
waiting on a TX
lock held by
session 2535. Keep
doing until you
notice the
problem!
26 (c)OraPub, Inc ASH - Incident
Observations
@ No table level locks, only row
@ Lock duration not crazy long
@ Sessions are not blocked by the same
session
@ The same couple SQL IDs are involved
@ Dead lock is clearly seen three times!
@ I visually displayed one of the deadlocks
27 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Lessons learned
@ Be extremely careful about killing sessions,
especially in a batch environment when
programs may automatically be set to
restart.
@ Why locks occurring? Look at sql_id,
module, program and bind variables for
clues.
@ ASH data is invaluable for seeing detail
not shown anywhere else.
@ An Oracle time-based analysis would not
have solved this mystery.
28 (c)OraPub, Inc ASH - Incident
Moving
To The
Next Level
45 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
R,
visual,
ash,
incident
46 (c)OraPub, Inc ASH - Incident
R,
visual,
ash,
incident
47 (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
Resource listing
@ Craig’s Blog & Website – Search: “otba”, “ash”
@ Toolkits – BloodHound, ASH Scratch Pad, OSM
@ Presentations – Search OraPub.com: “otba”,”ash”, etc.
@ Books
– Oracle Performance Firefighting.
– Forecasting Oracle Performance.
@ OraPub Membership for premium content
– Webinars – one or two each month
– Video Seminars – any device, any time, high quality
– Learning paths, assessments and certificates, priority response
– Community SLACK team
@ Live Virtual Training – Multiple 2 hours sessions with daily break
– Oracle Tuning Fastpath
– Tuning Oracle Using An AWR Report
– Tuning Oracle Using Advanced ASH Strategies
– Oracle Buffer Cache Performance Analysis & Tuning
48 (c)OraPub, Inc ASH - Incident
Thank
50
You! (c)OraPub, Inc ASH - Incident
This presentation was given by Craig Shallahamer (craig@orapub.com)
in July 2017 at the Latin American OTN Tour 2017.
How To Diagnose Random
Performance Incidents
Using ASH
Craig Shallahamer
craig@orapub.com
51 (c)OraPub, Inc ASH - Incident