Exadata Maa
Exadata Maa
1 2 3 4 5
Why focus on What is Maximum Exadata Summary
Maximum Maximum Availability Lifecycle
Availability? Availability Architecture Operations
Architecture? features in
Exadata
Continuous availability
Data protection
Active replication
Production site Replicated site
Globally Distributed
Zero Downtime Migration (ZDM) RAC FPP
Database
11 Copyright © 2025,
2023, Oracle and/or its affiliates
affiliates.
MAA Reference Architectures
Availability service levels
ü Performance
ü Manageability
ü Availability
• Redundant Network
– Redundant 100Gb/s RoCE and switches
– Client access using HA bonded networks
– Integrated HA software/firmware stack
Faster
• Exadata RDMA Memory
• Performance-optimized Flash
• Capacity-optimized Flash or Hard Disk Capacity-Optimized
Flash
• Baremetal or KVM Based Virtualization Disk
Human Error
X11M Database Server Prevention! X11M Storage Server
Zero impact major Linux upgrades, e.g. OL8 in Low I/O latency preservation during unplanned and
Exadata release 23.1 planned outages
Zero impact security software upgrades including Tightly integrated hardware & software with auto repair
STIG compliance of sick storage
MS (Management Server) alerting of key Database Exadata X10M and X11M Extreme Flash storage server
and Grid Infrastructure software incidents with both performance and capacity-optimized flash
• Enter Real Time Insight in Exadata release 22.1. Simply zoom into one of the dashboards to observe
performance trends or shine a bright light on performance anomalies
I/O hang detection and repair Health factor on predicatively failed disks
I/O and Network Resource Management
Exachk full stack healthcheck with critical issues alerts
Elimination of false positive drive failures
EM failure reporting
VLAN support and automation
Active Active ROCE Network
Drop BBU for Replacement Exadata Smart Flash Logging
Disk confinement
Cell-to-Cell Rebalance Data Accelerator Cache preservation Redundancy protection on cellsrv shutdown
Corruption prevention with HARD support Smart Write Back Flash Cache persistence
Auto disk management Cell I/O timeout threshold
Automatic ASM mirror read on I/O error corruption Appliance mode supportCell Alert Summary
• Logical corruptions
• Database block with correct checksum but logically
inconsistent
• Structure below the header is corrupt
• Lost Write
• Row locked by inexistent transaction
• …
Source: Wikipedia
23 Copyright © 2025, Oracle and/or its affiliates
Exadata : Data Protection
Corruption Detection & Prevention
On the storage cells, what happens if a drive is • When a storage failure occurs, redundancy is
reported as but has not really failed? impacted
• Automatic power cycle the drive / flash to • Restoration of redundancy is prioritized in
avoid false positive drive failure database-aware order to preserve data
• Order of Priority
1. Control Files
2. Online logs
3. Archivelogs
4. ASM SPfile
5. Database SPfile
6. TDE key store
7. OCR
8. Standby Redo Logs
9. Wallet
10. Datafiles
Exadata Host1
exa1domU1 exa1domU2
/dev/exadata_quorom/QD_DATAC1_EXA1DOMU1 /dev/exadata_quorom/QD_DATAC2_EXA1DOMU2
/dev/exadata_quorom/QD_DATAC1_EXA2DOMU1 /dev/exadata_quorom/QD_DATAC2_EXA2DOMU2
R R
A A
C C
1 2
Exadata Host 2
exa2domU1 exa2domU2
/dev/exadata_quorom/QD_DATAC1_EXA1DOMU1 /dev/exadata_quorom/QD_DATAC2_EXA1DOMU2
/dev/exadata_quorom/QD_DATAC1_EXA2DOMU1 /dev/exadata_quorom/QD_DATAC2_EXA2DOMU2
• Exadata includes Hardware Assisted Resilient Data (HARD) checks to prevent corruption for specific
file types:
• Spfile
• Controlfiles
• Log files
• Datafiles
• Data Guard Broker Files
https://blogs.oracle.com/exadata/post/exadata-disk-scrubbing
36 Copyright © 2025, Oracle and/or its affiliates
Conclusion Data Protection
Brown out
Quality Of Service and
Performance
&
Seconds
20
10
1
0
Exadata Traditional
Storage
39 Copyright © 2025, Oracle and/or its affiliates
Exadata : Quality of Service & Performance
Storage Server Disk Confinement
Write-Through Write-Back
W
R R R
E
I E Application I/O
T
A
E
A
D
rarely hits the hard
D
W
R
disks or Capacity-
I Optimized Flash
T
E
W
R R R
E I E
A T A
D E D
• Beneficial when multiple concurrent workloads require hard disk I/O bandwidth
(eg backups)
Flush later
to
• Asynchronous flush to capacity-optimized flash or HDD Flash/Disk
Undiscovered hardware /
software issue?
• ASM requires space to allow for rebalancing of data in the event of a failure
• Ensures rebalance is successful
• Restores redundancy
• Space to ensure rebalance is successful is not reserved
• Reports ORA-15041 if there is not enough space to complete rebalance
REQUIRED_MIRROR_FREE_MB * Exadata X10M and newer Extreme Flash has hardware-specific requirements
• Depends on the number of failure groups and ASM version
Applies to any disk group and any redundancy (HIGH or NORMAL) Required % Free of Disk
• Same for all media types and hardware generations* Group Capacity to
Number of Failure Groups
Redundancy Successfully Rebalance
(8 ASM disks / FG)
after a single physical
Grid Infrastructure Number of Failure Required % Free of Disk disk failure
Version Groups Group Capacity less than 5 NORMAL 15%
12.1.0 Any 15
less than 5 HIGH 29%
12.2, 18.1+ less than 5 15
5 or more NORMAL 9%
12.2, 18.1+ 5 or more 9
5 or more HIGH 11%
• X10M and newer EF cells have four physical flash disks with two ASM disks per
physical flash disk. Therefore, a flash card failure will result in two ASM disks being dropped.
• GI/ASM 19c and newer with patch 34281503
• Smart Rebalance affects High Redundancy disk groups when a failure occurs
• If disk group has required free space
• Data is rebalanced and redundancy restored
• If disk group DOES NOT have required free space
• Disk is offlined and rebalance deferred
• Disk is re-mirrored efficiently from partner disks once replaced
• Reduces data movement and extra I/O at failure time if more capacity is required for
database storage
Results in up to 3x
faster redundancy restoration
3 or 4 Extreme Flash 8 8 8
• Disk/Flash WARMUP
Credit : Kathy
https://unsplash.com/photos/R7nSPG8edVI
52 Copyright © 2025, Oracle and/or its affiliates
Exadata : Quality of Service & Performance
Exadata built for speed
• Smart Scan
• Smart Flash Cache
• Storage Index
• “The fastest I/O operation is the one that you
don’t need to do”
2 Active – Active ports in every RDMA Network Fabric Adapter RDMA Network Fabric Adapter
22 Ports per switch used for internal cluster network, cabled RDMA Network Fabric Switch
ensuring no single point of failure exists
Leaf Switch
• Switch misconfiguration
• Excessive pause frames
192.168.1.1
Network traffic stalls may result in database instability 192.168.1.2
or outages
Brownout
Quality Of Service and
Performance
Blackout Brownout
Systems are complex and an issue one layer can cascade to other layers
Our Engineered Systems and MAA best practices are designed and tuned to tackle this
Storage Storage
Controller Controller
Proprietary
Protocol
Timeouts
ASM Storage
ACTIVE CONNECTION
PASSIVE CONNECTION
• When storage server is shutdown the diskmon process in the Grid Infrastructure on the database
server is notified
Database Tier
diskmon
Storage Tier
1.
2. Cell with primary mirror
remains
populated
populatedin in 2. Cell with secondary mirror populated in Cell with tertiary mirror located on
super
superlow
lowlatency
latencyData
DataAccelerator
Accelerator low latency flash cache high latency hard disk throughout
Warm
Cold
Hot
ASM rebalance, resync, and resilver always preserve flash cache state when moving extents
69 Copyright © 2025, Oracle and/or its affiliates
Exadata : Brownout
Cell-to-Cell Rebalance Preserves Data Accelerator Population
Data Data
Accelerator Accelerator
• Exadata 21.2.*
• No blackouts
• Significantly reduced brownout
Brownout
Quality Of Service and
Performance
k
a ch
Ex
ed
at
td
Ou
• Automatically stream up-to-the-second metric observations from all servers in your Exadata fleet
• Feed customizable monitoring dashboards for real-time analysis and problem-solving
• Comprehensive
• 200+ Exadata Software & Hardware Metrics
• Fine-grained metrics can be collected as often as
every 1 second
• Integrated
• Integrated with popular time-series and observability
platforms
• Stream fine-grained metrics to user-defined endpoints in
real time
• Insightful
• Enables proactive issue detection and real-time decision
making
https://blogs.oracle.com/exadata/post/real-time-insight-quick-start
79 Copyright © 2025, Oracle and/or its affiliates
Exadata : Life Cycle Management
Exadata Real-Time Insight – Sample Dashboards Code
• https://github.com/oracle-samples/oracle-db-examples/tree/main/exadata/insight
Schrödinger Backup :
The condition of any backup is unknown until a restore is attempted
Exadata Live Update uses familiar Linux technologies, including RPM and
ksplice, to apply updates online to database servers/VMs avoiding the
need to reboot
Exadata Live Update multiple options based on the Common Vulnerability Scoring System (CVSS).
When using Exadata Live Update, you choose from the following options:
highcvss Applies only security updates to address vulnerabilities with a CVSS score of 7 or greater
allcvss Applies only security updates to address vulnerabilities with any CVSS score
full Performs a full update, which includes all security-related updates and all other non-
security updates. Equivalent to regular updates applied with a server/VM reboot
Not all update content can be applied online, or activated without a reboot
• e.g. firmware, booting with the latest kernel, JDK
These updates are called ‘outstanding work’ and are staged for activation at the next graceful shutdown
Graceful reboots
• Include vm_maker --stop_domain/--start_domain operations, host restart (shutdown –r), a short press of the
power button on the server, etc.
• Restarting the physical database server also restarts VMs
• Useful (but not required) to align VM and physical server reboot
• Avoid resetting VMs and physical servers while outstanding work is applied
Fleet Patching & Provisioning the tool for out place patching
• Database homes
• Grid infrastructure and combined GI + DB patching
• Also Exadata patching
• www.oracle.com/goto/FPP
• One tool to patch / upgrade your whole Oracle DB stack
Backup
• https://www.oracle.com/technetwork/database/availability/recovery-appliance-maint-practices-4487388.pdf
KVM Virtualization
• https://www.oracle.com/a/tech/docs/exadata-kvm-overview.pdf
Security
• https://www.oracle.com/a/tech/docs/exadata-maximum-security-architecture.pdf
Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)
Oracle Exadata Database Machine EXAchk (Doc ID 1070954.1)
Oracle Exadata Best Practices (Doc ID 757552.1)
Exadata Critical Issues (Doc ID 1270094.1)
Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)
The ASM Priority Rebalance feature - An Example (Doc ID 1968607.1)
Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1)
Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard
Configuration (Doc ID 1302539.1)
Understanding ASM Capacity and Reservation of Free Space in Exadata (Doc ID 1551288.1)