VIDYASAGAR UNIVERSITY
A Seminar Report On
RAID
Submitted By
ASISH KUMAR DAS
MCA 3rd Semester
ROLL : VU/PG/20/32/02-IIIS NUMBER : 0058
REGN NO : 0000300 SESSION : 2017-2018
USIN : 1120102324
ACKNOWLEDGEMENT
I would like to thank respected …………..............................for giving me
such a wonderful opportunity to expand my knowledge for my own branch
and giving me guidelines to present a seminar report. It helped me a lot to
realize of what we study for.
Secondly, I would like to thank my parents who patiently helped me as i
went through my work and helped to modify and eliminate some of the
irrelevant or un-necessary stuffs.
Thirdly, I would like to thank my friends who helped me to make my work
more organized and well-stacked till the end.
Next, I would thank Microsoft for developing such a wonderful tool like MS
Word. It helped my work a lot to remain error-free.
Last but clearly not the least, I would thank The Almighty for giving me
strength to complete my report on time.
PREFACE
I have made this report file on the topic RAID; I have tried my best to
elucidate all the relevant detail to the topic to be included in the report.
While in the beginning I have tried to give a general view about this topic.
My efforts and wholehearted co-corporation of each and everyone has
ended on a successful note. I express my sincere gratitude to
……………………………… who assisting me throughout the preparation
of this topic. I thank him for providing me the reinforcement, confidence
and most importantly the track for the topic whenever I needed it.
RAID
The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk
drives into an array of disk drives to obtain performance, capacity and reliability that exceeds that of a single
large drive. The array of drives appears to the host computer as a single logical drive.
The Mean Time Between Failure (MTBF) of the array is equal to the
MTBF of an individual drive, divided by the number of drives in the array. Because of this, the MTBF of a non-
redundant array (RAID 0) is too low for mission-critical systems. However, disk arrays can be made fault-
tolerant by redundantly storing information in various ways.
Five types of array architectures, RAID 1 through RAID 5, were originally defined, each provides disk fault-
tolerance with different compromises in features and performance. In addition to these five redundant array
architectures, it has become popular to refer to a non-redundant array of disk drives as a RAID 0 array.
Disk Striping
Fundamental to RAID technology is striping. This is a method of
combining multiple drives into one logical storage unit. Striping partitions the storage space of ach drive into stripes,
which can be as small as one sector (512 bytes) or as large as several megabytes. These stripes are then interleaved
in a rotating sequence, so that the combined space is
composed alternately of stripes from each drive. The specific type of
operating environment determines whether large or small stripes should be used.
Most operating systems today support concurrent disk I/O operations
across multiple drives. However, in order to maximize throughput for the disk subsystem, the I/O load must be
balanced across all the drives so that each drive can be kept busy as much as possible. In a multiple drive
system without striping, the disk I/O load is never perfectly balanced.
Some drives will contain data files that are frequently accessed and some drives will rarely be accessed.
By striping the drives in the array with stripes large enough so that each record falls entirely within one stripe, most
records can be evenly
distributed across all drives. This keeps all drives in the array busy during heavy load situations. This situation
allows all drives to work concurrently on different I/O operations, and thus maximize the number of
simultaneous I/O operations that can be performed by the array.
Definition of RAID Levels
RAID 0 is typically defined as a group of striped disk drives without parity or data redundancy. RAID 0 arrays can
be configured with large stripes for multi-user environments or small stripes for single-user systems that
access long sequential records. RAID 0 arrays deliver the best data
storage efficiency and performance of any array type. The disadvantage is that if one drive in a RAID 0 array fails,
the entire array fails.
RAID 1, also known as disk mirroring, is simply a pair of disk drives that store duplicate data but appear to the
computer as a single drive. Although striping is not used within a single mirrored drive pair, multiple RAID 1
arrays can be striped together to create a single large array consisting of pairs of mirrored drives. All writes must go
to both drives of a mirrored pair so that the information on the drives is kept identical. However, each
individual drive can perform simultaneous, independent read operations.
Mirroring thus doubles the read performance of a single non-mirrored
drive and while the write performance is unchanged. RAID 1 delivers the best performance of any redundant array
type. In addition, there is less
performance degradation during drive failure than in RAID 5 arrays.
RAID 2 arrays sector-stripe data across groups of drives, with some
drives assigned to store ECC information. Because all disk drives today embed ECC information within each sector,
RAID 2 offers no significant advantages over other RAID architectures and is not supported by
Adaptec RAID controllers.
RAID 3, as with RAID 2, sector-stripes data across groups of drives, but one drive in the group is dedicated to
storing parity information. RAID 3
relies on the embedded ECC in each sector for error detection. In the case of drive failure, data recovery is
accomplished by calculating the exclusive OR (XOR) of the information recorded on the remaining drives. Records
typically span all drives, which optimizes the disk transfer rate. Because
each I/O request accesses every drive in the array, RAID 3 arrays can satisfy only one I/O request at a time. RAID 3
delivers the best
performance for single-user, single-tasking environments with long
records. Synchronized-spindle drives are required for RAID 3 arrays in order to avoid performance degradation with
short records. Because
RAID 5 arrays with small stripes can yield similar performance to RAID 3 ar rays, RAID 3 is not supported by
Adaptec RAID controllers.
RAID 4 is identical to RAID 3 except that large stripes are used, so that records can be read from any individual
drive in the array (except the
parity drive). This allows read operations to be overlapped. However, since all write operations must update the
parity drive, they cannot be
overlapped. This architecture offers no significant advantages over other RAID levels and is not supported by
Adaptec RAID controllers.
overlapped. This architecture offers no significant advantages over other RAID levels and is not supported by
Adaptec RAID controllers.
.
RAID 5, sometimes called a Rotating Parity Array, avoids the write
bottleneck caused by the single dedicated parity drive of RAID 4. Under RAID 5 parity information is distributed
across all the drives. Since there is no dedicated parity drive, all drives contain data and read operations
can be overlapped on every drive in the array. Write operations will typically access one data drive and one parity
drive. However, because different records store their parity on different drives, write operations can usually be
overlapped.
In summary:
• RAID 0 is the fastest and most efficient array type but offers no fault- tolerance.
RAID 0 requires a minimum of two drives.
• RAID 1 is the best choice for performance-critical, fault-tolerant
environments. RAID 1 is the only choice for fault-tolerance if no more than two drives are
used.
• RAID 2 is seldom used today since ECC is embedded in all hard drives. RAID 2 is
not supported by Adaptec RAID controllers.
• RAID 3 can be used to speed up data transfer and provide fault- tolerance in
single-user environments that access long sequential
records. However, RAID 3 does not allow overlapping of multiple I/O operations and
requires synchronized-spindle drives to avoid
performance degradation with short records. Because RAID 5 with a small stripe size
offers similar performance, RAID 3 is not supported by Adaptec RAID controllers.
• RAID 4 offers no advantages over RAID 5 and does not support
multiple simultaneous write operations. RAID 4 is not supported by Adaptec RAID
controllers.
• RAID 5 combines efficient, fault-tolerant data storage with good
performance characteristics. However, write performance and
performance during drive failure is slower than with RAID 1. Rebuild
operations also require more time than with RAID 1 because parity
information is also reconstructed. At least three drives are required
for RAID 5 arrays
Dual-Level RAID
In addition to the standard RAID levels, Adaptec RAID controllers can combine
multiple hardware RAID arrays into a single array group or
parity group. In a dual-level RAID configuration, the controller firmware stripes two or
more hardware arrays into a single array.
NOTE The arrays being combined must both use the same RAID level.
Dual-level RAID achieves a balance between the increased data availability inherent in
RAID 1 and RAID 5 and the increased read performance inherent in disk striping
(RAID 0). These arrays are sometimes referred to as RAID 0 + 1 or RAID 10 and
RAID 0 + 5 or RAID 50.
Creating Data Redundancy
RAID 5 offers improved storage efficiency over RAID 1 because only the parity information is stored, rather
than a complete redundant copy of all data. The result is that three or more drives can be combined into a RAID
5 array, with the storage capacity of only one drive dedicated to
store the parity information. Therefore, RAID 5 arrays provide greater storage efficiency than RAID 1 arrays.
However, this efficiency must be balanced against a corresponding loss in performance.
The parity data for each stripe of a RAID 5 array is the XOR of all the data in that stripe, across all the drives in
the array. When the data in a stripe is changed, the parity information is also updated. There are two ways to
accomplish this:
The first method is based on accessing all of the data in the modified
stripe and regenerating parity from that data. For a write that changes all the data in a stripe, parity can be
generated without having to read from the disk, because the data for the entire stripe will be in the
cache. This is known as full-stripe write. If only some of the data in a stripe is to change, the missing data (the
data the host does not write) must be read from the disks to create the new parity. This is known as partial-
stripe write. The efficiency of this method for a particular
write operation depends on the number of drives in the RAID 5 array and what portion of the complete stripe is
written.
The second method of updating parity is to determine which data bits were changed by the write operation and
then change only the
corresponding parity bits. This is done by first reading the old data
which is to be overwritten. This data is then XORed with the new data that is to be written. The result is a bit
mask which has a 1 in the
position of every bit which has changed. This bit mask is then XORed with the old parity information from the
array. This results in the
corresponding bits being changed in the parity information. The new updated parity is then written back to the
array. This results in two
reads, two writes and two XOR operations. This is known as read- modify-write.
The cost of storing parity, rather than redundant data as in RAID 1, is the extra time required for the write operations
to regenerate the parity
information. This additional time results in slower write performance for RAID 5 a r rays over RAID 1. Because
Adaptec RAID controllers generate XOR in hardware, the negative effect of parity generation is primarily from the
additional disk I/O required to read the missing information and write the new parit y. Adaptec RAID controllers can
generate parity using either the full- or partial-stripe write algorithm or the read-modify-write
algorithm. The parity updated method chosen for any given write operation is determined by calculating the number
of I/O operations
needed for each type and choosing the one with the smallest result. To increase the number of full stripe writes, the
cache is used to combine small write operations into larger blocks of data.
Handling I/O Errors
Adaptec RAID controllers maintain two lists for each RAID 5 a r ray: a Bad Parity List, and a Bad Data List. These
lists contain the physical block
number of any parity or data block that could not be successfully written during normal write, rebuild or dynamic
array expansion operations.
These lists alert the controller that the data or parity in these blocks is not valid. If the controller subsequently needs
data from a listed block and
cannot recreate the data from existing redundant data, it returns an error condition to the host.
Blocks are removed from the Bad Parity List or the Bad Data List if the controller successfully writes to them on a
subsequent attempt.
Degraded Mode
When a drive fails in a RAID 0 array, the entire array fails. In a RAID 1
array, a failed drive reduces read performance by 50%, as data can only be read from the remaining drive.
Write performance is increased slightly
because only one drive is accessed. A RAID array operating with a failed drive is said to be in degraded
mode.
RAID 5 arrays synthesize the requested data by reading and XORing the corresponding data stripes from the
remaining drives in the array. For RAID 5, the magnitude of the performance impact in degraded mode
depends on the number of drives in the array. An array with a large
number of drives will experience more performance degradation than an array with small number of drives.
Rebuilding a Failed Hard Drive
A failed drive can be replaced in a RAID 1 or RAID 5 array by physically removing the drive and replacing
it or by a designated Hot Spare.
Adaptec RAID controllers will rebuild the data for the failed drive onto the new drive or Hot Spare. This
rebuild operation occurs online while normal host reads and writes are being processed by the array.
RAID 1 arrays are rebuilt relatively quickly, because the data is simply copied from the duplicate
( mirrored) drive to the replacement drive. For RAID 5 arrays, the data for the replacement drive must be
synthesized by reading and XORing the corresponding stripes from the remaining drives in the array. RAID
5 arrays that contain a large number of drives will
require more time for a rebuild than a small array