0% found this document useful (0 votes)
36 views26 pages

CLARiiON Performance Practices

Uploaded by

biml2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views26 pages

CLARiiON Performance Practices

Uploaded by

biml2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CLARiiON Performance Practices

Dave Zeryck

© Copyright 2008 EMC Corporation. All rights reserved. 1


Agenda

y User and Usage Profile


y Basic Rules Common to All Profiles
y Random Implementation
– Layout Considerations
– The Cache
– Sizing

y Sequential Implementation
– Layout Considerations
– The Cache
– Sizing

y Managing a Mixed Environment


y Special Focus: Oracle ASM and VMware

© Copyright 2008 EMC Corporation. All rights reserved. 2


User and Usage profile

y We’ll focus on two ‘performance character’ categories:


– Random
– Sequential

y These two characteristics dominate the design process


y Within each category, differences between applications and approaches
are very similar
– We are not going to over-think this problem

y A bit of focus on special cases:


– Oracle Automated Storage Management (ASM)
– Microsoft Exchange
– VMware

© Copyright 2008 EMC Corporation. All rights reserved. 3


User and Usage Profile

y Random applications
– File systems
ƒ Includes some media implementations
ƒ Includes storage for virtual hosts (VMware)
– Virtual Hosting (VMware boot)
– Messaging/Email (Exchange)
– Oracle ASM – DWSE is a special case of large random
– RDBMS: (OLTP)
ƒ Logs are sequential but not a demanding sequential environment
– Service Model IT Departments
ƒ Handing off chunks of storage to various users

y Sequential applications
– Media – (but not as sequential as you might think!)
– Backup/archive/replication
– Satellite/instrumentation download
– Geographic Information Systems (GIS)

© Copyright 2008 EMC Corporation. All rights reserved. 4


Basic Rules Common to All Profiles

y Don’t mix random and sequential on the same drives or same back-end
bus
– You can put them together on the same array: segregate them on different disks &
back-end buses
y Don’t overcommit the read cache
– You don’t need a lot – 20% of total cache is enough: 50 MB to 250 MB
– Use the rest for write cache
y Don’t focus strictly on capacity
– Big drives are low cost/GB, but you need sufficient spindles for your workload
y Avoid using the vault for your heavy response-time sensitive data
– First 5 drives in a CLARiiON
– They can take a lot of load, but will slow Navisphere if you overload them
– Less risk for host with CX4 Write Cache Availability
y Use the right drives
– FC/SAS for mission-critical high transactions
– SATA are best for large sequential, nearline, or any mix with low IOPS rates

© Copyright 2008 EMC Corporation. All rights reserved. 5


Rules for Random Access: Layout

y Go wide
– Don’t fret mixing applications; random access is random no matter how you mix it
– Apps rarely peak at the same time

APP1 APP2 APP3


Use metaLUN or host
striping to distribute
I/O METALUNs

Avoid ‘islands’ of APP1 APP2 APP3


performance (aka hot
spots)

© Copyright 2008 EMC Corporation. All rights reserved. 6


Rules for Random Access: Layout

y RDBMS/Exchange Logs: alternate with data disks unless you are really
big
– RARELY do you need to split out logs for performance
ƒ Sequential small writes will hit cache, will not fill it
ƒ Sequential reads (archive) are easily served
ƒ Make sure your data table access does not fill the cache
– Sharing drives gives the maximum drive count to the busy tables
ƒ Helps keep that cache from filling
– Alternating with another SG’s data maintains protection

DB1 Data DB2 Data

DB2 Log DB1 Log

© Copyright 2008 EMC Corporation. All rights reserved. 7


Rules for Random Access: Layout

y RAID 5 is the all-around price/performance champion


y Other RAID types matter – If you have a high write load (>20%)
y When to use RAID 1/0
– If your writes are higher than 20%, RAID 1/0 will absorb them with the least impact on reads.
RAID 5 will still give good performance, but reduce the top line achievable for that number of
drives.
– If your writes are > 40% you really should be looking at RAID 1/0.
ƒ This is the case with Microsoft Exchange

y When to use RAID 6


– Relatively low write rate or low write percentage (15% or less)
ƒ Reads are about the same as RAID 5, RAID 1/0
– Data integrity is paramount

10-drive use-case example – random 4 KB at 30% write


RAID 1/0 RAID 5 RAID 6
2100 IOPS, 30% write 2100 IOPS, 30% write 2100 IOPS, 30% write
Disks @ 70%, Disks @ 92%, You can’t get
Response time 13 ms. Response time 52 ms. there from here!

© Copyright 2008 EMC Corporation. All rights reserved. 8


Rules for Random Access: Layout

y RAID stripe size does not matter for Random performance


– 3+1, 7+1, 5+1, 10+1 – whatever makes sense for the cost/GB, geometry of the
layout, etc.
– Larger groups take longer to rebuild, thus lower availability

y Fix alignment issues before laying down data


– Windows (before Win2007) and Linux on Intel are affected
– Use DISKPAR or FDISK to align at 64 KB (128 blocks)
– VMware:
ƒ For a RDM use Diskpar at the VM level
ƒ For VMFS use FDISK at ESX and again DISKPAR at the VM level

y What, you forgot to align? Is it worth rebuilding?


– Take your predominant I/O size, divide by 64. That’s the percentage of
performance you are losing.
– LARGE I/O is MORE affected than small IO!
ƒ Exchange 2003, 4 KB, very small impact
ƒ Exchange 2008 (8KB), SQL Server (64 KB) Oracle ASM (8KB-1MB) larger impact

Example: Oracle 9, most I/O is 8 KB. 8/64 = 12.5% Å That’s the amount
of performance ‘wasted’ due to misalignment
© Copyright 2008 EMC Corporation. All rights reserved. 9
Rules for Random Access: Layout

y Oracle ASM: large, random I/O


– ASM does striping and distribution so do not overstripe on the CLARiiON
ƒ 1 or 2 LUNs per RAID group
– Pre-R26: RAID 1/0 striped at 256 blocks works well (you need to use CLI to do that)
– Be square! 2+2, 4+4 matches stripe to I/O size
– Align! Important for Large IO
– Turn off read cache for data LUNs – probably won’t get used

y Maximize CLARiiON ‘multi-stripe read’ in Release 26


– RAID 1/0 2+2 is great here. It pushes large reads to the drives (up to 512 KB).
– Use a striped metaLUN with a multiplier of 8 to get larger partitions and ‘spread it wide.’
– This will send I/O as large as 1 MB to the component LUN, I/O of up to 512 KB at the drive.

SPA Meta ÅMetaLUNs; Stripe Multiplier = 8


(effective meta stripe depth 1 MB)
SPB Meta

ÅComponent LUNs (128 KB stripe)

Å2+2 groups
© Copyright 2008 EMC Corporation. All rights reserved. 10
Rules for Random Access: The Cache

y Set page size to predominant I/O size, or 8 KB if really mixed


– Changing page requires disabling cache so you have to plan ahead
y If you have high latency during writes, lower the watermarks
– Easy, nondisruptive, nondestructive to do
– This provides more ‘reserve space’ in the cache to absorb bursts of writes
– CX4-480 and smaller: high watermark should be 20% higher than the low watermark
– CX4-960: high watermark should be 10% higher than the low watermark

100
Cache Usage Over Time To absorb bursts,
Forced Flush we reserve some
90
cache space – the
Percent Cache Used

80
Hi-Water Flush area above the
high watermark.
70

Set the
60
Idle Flush watermarks
50 lower if you hit
too many forced
40
Time
flushes.
H Watermark Cache Usage Low Watermark
© Copyright 2008 EMC Corporation. All rights reserved. 11
Rules for Random Access: The Cache

y Try turning off prefetch for random-access LUNs


– Some file systems store data in short ‘runs’ that result in a lot of prefetch, but no
payoff
– Easy, nondisruptive
– ESPECIALLY for Oracle ASM on Linux!
ƒ Oracle ASM default 1 MB chunk gets split into 2 512KB by Linux; this spoofs the CLARiiON
into prefetching – which is wasted

y For randomized sequential (a lot of media applications) increase the


prefetch multipliers
– CCTV is a classic case of this
– Easy, nondisruptive

© Copyright 2008 EMC Corporation. All rights reserved. 12


Rules for Random Access: Sizing

y Calculate the Total IO requirement


– RAID 1 and 1/0 require that two disks be written for each host write
ƒ Total I/O = Host Reads + 2 * Host Writes
– RAID 5 requires four operations per host write: 2 reads and 2 writes
ƒ Total I/O = Host Reads + 4 * Host Writes

RESULT: Total I/O REQUIRED TO ABSORB

y Calculate number of disks needed


– Rule-of-thumb rates (70% utilization, very random read targets)
ƒ 15K rpm FC/SAS: 180 IOPS
ƒ 10K rpm FC/SAS: 140 IOPS
ƒ 7.2K rpm SATA: 80 IOPS

NUMBER OF DISKS = Total IO / IOPS per Disk

© Copyright 2008 EMC Corporation. All rights reserved. 13


Rules for Random Access: Sizing

y Calculate the IOPS Requirement Example


TOTAL IO
RAID 6: Total I/O = Host Reads + 6 * Host Writes
RAID 5: Total I/O = Host Reads + 4 * Host Writes
RAID 1 and RAID 1/0: Total I/O = Host Reads + 2 * Host Writes

Example: HOST LOAD: 5,200 Random IOPS, 60% Reads


RAID 5 RAID 1/0
Total IO = 0.6 * 5,200 + 4 * ( 0.4 * 5,200 ) Total IO = 0.6 * 5,200 + 2 * ( 0.4 * 5,200 )
= 3,120 + 4 * 2,080 = 3,120 + 2 * 2,080
= 3,120 + 8,320 = 3,120 + 4,160
= 11,440 IOPS = 7,280 IOPS

Using FC drives, 15k RPM

11,440 IOPS / 180 IOPS per Disk 7,280 IOPS / 180 IOPS per Disk
11440 / 180 = 64 disks 7280 / 180 = 41 Disks

© Copyright 2008 EMC Corporation. All rights reserved. 14


Rules for Sequential Access: Layout

y For sequential as primary I/O pattern, 64KB or larger


y Don’t spread a LUN over many drives
– Focus sequential streams on a small number of drives
– You get the best bandwidth per stream with 1 stream per disk group
– With FC, bandwidth per RAID group increases until about 4 streams
ƒ But each stream will be incrementally less
ƒ Use 1 or 2 streams for SATA drives

y The best RAID for sequential is RAID 3 or 5; RAID 6 is OK too


– On reads, all drives used in RAID 5, RAID 6
– On writes, less back-end load than RAID 1/0

RAID 1/0 4 + 4 RAID 5 4+1

Full stripe write (256 KB) Full stripe write (256 KB)

512 KB 320 KB
transferred transferred

© Copyright 2008 EMC Corporation. All rights reserved. 15


Rules for Sequential Access: Layout

y Use your file system to force large I/O dispatch


– Maxphys on Solaris, VxFS write_pref_io, SG_SEGMENTS, MAX_SECTORS in Linux
HBA driver, etc.

y For large I/O cases (256 KB+), be square!


– 2+1, 4+1, 8+1 result in stripe sizes that match the I/O
– 2*128 = 256, etc.
ƒ This evens read access
ƒ Allows cache bypass

512 KB I/O 512 KB I/O

4+1: 256 KB Stripe + Parity 5+1: 320 KB Stripe + Parity


© Copyright 2008 EMC Corporation. All rights reserved. 16
Rules for Sequential Access: Layout

y How to ‘supercharge’ a single-threaded application


– We see a lot of these in custom media, satellite download/instrumentation
– Limited by response time of the single outstanding request, and max transfer of the
storage system (1 MB per any one request)

y Use a Host Logical Volume Manager (LVM) to create parallelism


– You must be able to dispatch a large I/O to the LVM, filling the LVM stripe
– LVM then dispatches all stripe segments in parallel

1 MB
1 MB 4 MB I/O from File System
1 MB
1 MB LVM Stripe 1 MB 1 MB 1 MB 1 MB
rTime .05
rTime .05
IOPS: 1/.05 = 20
IOPS: 1/.05 = 20
20 MB/s
20 MB/s x 4
= 80 MB/s
Single LUN Multiple LUNs (best if on separate RAID groups)
© Copyright 2008 EMC Corporation. All rights reserved. 17
Rules for Sequential Access: The Cache

y Use the largest cache page size that


makes sense
– System shared with random I/O: 8 KB
– Sequential-only system: 16 KB

y A larger read cache may help


– 30% of cache in the larger systems

y Experiment with prefetch


– The file system can do some strange things
– Use Analyzer: “%Prefetches Used”

y You might go faster without read caching


– Each read is a data copy to memory 33 MB
between file
– If you are not using it, (no read hits), turn it off, accesses
and achieve higher system deliverable bandwidth

y The drives prefetch too!


– Read hit from SP: 0.1 ms
– Drive cache hit: 1 to 2 ms. Example – large file, but file system
– Drive cache miss: 5 ms. broke it into sections

© Copyright 2008 EMC Corporation. All rights reserved. 18


Rules for Sequential Access: Sizing

y Use Fibre Channel attach for high-


bandwidth
– FC, SAS, or SATA drives on the back end
– FC, SAS best if there is a mixed workload
Large gaps
y Use Rule-of-thumb rates per disk for between file
accesses:
Media 846,094
– Media is often ‘randomized’ sequential. File blocks – 413
systems store files with gaps (see example) MB
– FC/SAS 15K rpm: 12 MB/s
– FC/SAS 10K rpm: 10 MB/s
– SATA 7.2K rpm: 8 MB/s

y Note bus limits, Front and back-end


– Each 4 Gb FC Port is good for about 360
Per system MB/s Maxima
MB/s
Model Reads Writes
y Choose the right CLARiiON model CX4-960 2900 1100
CX4-480 1100 550
CX4-240 1100 500
CX4-120 700 400
AX4-5 800 350

© Copyright 2008 EMC Corporation. All rights reserved. 19


Managing a Mixed Environment

Random and Sequential on the Same System

y Cache interaction: high-bandwidth writes filling cache


– Random writes get force-flushed, increases response time

y Bus/Memory/CPU saturation
– Sequential operations use a lot of bandwidth and CPU resources

y Disk interactions
– Disks sharing sequential and random will affect random and sequential
ƒ Large IO slow down small I/O
ƒ Random I/O break up sequential access at the drive

y However, it is practical and necessary to plan for shared use

© Copyright 2008 EMC Corporation. All rights reserved. 20


Managing a Mixed Environment

Random and Sequential on the Same System: use different disks!

If on separate disks If on shared disks


Define a “transaction” as 1 visit by each thread.
It will take 5.5ms + 16ms = 20.5ms

1 thread 8KB Number of transactions/s = 1000/20.5 = 50


5.5ms per read
180 IOPs/drive LUN1 is now at 50 IOPs (from 180 IOPs) –
BAD IMPACT: the small random are stuck in
line waiting for big sequential I/O

LUN2 is now at 50 IOPs (from 64 IOPs) –


modest impact
1 thread 512KB Disk in
16ms per read Merged RG
64 IOPs/drive

© Copyright 2008 EMC Corporation. All rights reserved. 21


Managing a Mixed Environment

Random and Sequential on the Same System

y Techniques
– Large IO: bypass write cache using write-aside
ƒ Large I/O direct to disk, no slower than using a full write cache
– Turn off read cache for LUNs not using it
ƒ Oracle ASM LUNs, for example
– Turn down prefetch
ƒ Reduces disk and memory usage
– Separate large/sequential loads from small random loads
ƒ Different disks
ƒ Different back-end buses

y Use Navisphere Quality of Service Manager


– Throttle certain processes to allow more system resources for other processes
– Favor small I/O over large I/O

© Copyright 2008 EMC Corporation. All rights reserved. 22


Managing a Mixed Environment

Random and Sequential: Front End effects – FC bus transfer time

I/O must be acknowledged (acked), and large transfers can slow the ack
Host Bus Array

Writes and Read Acks Legend


512KB Read 1.5ms
Reads and Write Acks 8KB Read 0.1ms
8KB Write 0.1ms
Host Read Ack 0.01ms
Array Write Ack 0.01ms

Only 1 transfer can be on a bus at any one time;


larger I/O can hold up small fast I/O Service Time: 6 x 512KB read threads = 9ms
7 x 8KB read threads = 0.7ms
Example, “Transaction” of 7 reads and 8 writes of 8 x write acks = 0.08ms
8 KB and 6 512 KB reads. TOTAL TIME for all Bus “Transaction” = 9.78ms
I/O to transfer is about 10 ms.
1000 / 9.78ms ~ 10ms / transaction
The write ‘acks’ that normally would take .01 ms
now wait several ms to get through….big impact!

© Copyright 2008 EMC Corporation. All rights reserved. 23


Managing a Mixed Environment

Random and Sequential: Front End

y Techniques
– If possible, zone “bandwidth” hosts to different ports than
“random” hosts
– Have client time high-bandwidth events off-hours from
sensitive small-block operations (OLTP, reporting etc.)

y Design
– TWO ports per system is enough for most hosts s
ƒ Performance per port: 70K IOPS or 360 MB/s pas Timeout
Timeout2
s
– Best not to zone an HBA to more than 2 ports per SP Tre
UNLESS other requirements demand
ƒ In SP fail or NDU, each path must fail before PowerPath will
trespass to peer SP SP A SP B

E:\ F:\

G:\ H:\

© Copyright 2008 EMC Corporation. All rights reserved. 24


Summary

y These are rules for the 80% of you who are not running at the edge
y A few basic design strategies will solve your problems
– Segregate random and sequential
– Spread random wide, to spread peak loads
– Concentrate sequential on fewer drives to reduce contention

y Have a rational view of drives and their capabilities, given the strange
things host file systems can do
y Use the CLARiiON Best Practices Guide for Fibre Channel Storage as
your reference for step-by-step design help

© Copyright 2008 EMC Corporation. All rights reserved. 25

You might also like