Getting Started With ZFS "Zettabyte File System" in Solaris.
ZFS stands for Zettabyte File System. Zfs is a 128 bit Fs that was firstly
introduced on June 2006 with Solaris 10 6/06 release. ZFS allows 256 quadrillion
zettabytes of storage. Which means there would be no limit on number of
filesystems and on number of files/directories that can exists on ZFS.
Zfs does not replace any traditional Filesystem (UFS in Solaris) and it does not
improve any existing UFS technology. Instead it is a New Approach to Manage
Data in Solaris. Zfs is more robust, scalable & earier to administer than
traditional FS. But it will take some time to so capture the market and to replace
UFS, the most stable FS for Solaris till date.
ZFS Pools
ZFS pools are the storage pools to manager Physical Disks/Storage. In traditinal
UFS filesystem we use to partition the disk and then used to make Filesystems on
the slices. In ZFS the approach is completely different. Here we used to make a
pool of Block devices (Disks) and the Filesystems are created from the pools. It
means whatever disks would be free can be used to create Filesystems as per
requirement. You can think Pools as Diskgroups used in VXVM.
ZFS requirements
1.) ZFS is fully supported on Sparc and intel Solaris boxes.
2.) ZFS is supportable in solaris 10 minimum release level should be Solaris 10
6/06.
3.) Recommended memory should be 1GB or high.
4.) The minimum disk size should be 128MB for ZFS as per documents and
minimum disk space for Storage pool is approx. 64MB.
ZFS Terminology
TERMS
Definition
Checksum
A 256-bit hash of the data in a FS block.
Clone
snapshot
A FS with contents that are identical to the contents of a ZFS
Dataset
A generic name for the following ZFS entities: clone,
filesystems, snapshots & volumes. Each dataset is identified by a unique name in
the ZFS namespace.
Default FS
A file system that s created by default when using Solaris Live
Upgrade to migrate from UFS to a ZFS root FS. The current set of default FS term
is /, /usr,/opt & /var.
ZFS FS
A ZFS dataset that is mounted within the standard syatem
namespace and behaves like other traditinal FS.
Mirror
A virtual device also called a RAID-1 devices, that stores
identical copies of data on two or more disks.
Pool
A logical group of block devices describing the layout & physical
characteristics of the available storage. Space for datasets is allocated from a
pool. Also called a storage pool or simply a pool.
RAID-Z
A virtual device that stores data and parity on multiple disks,
same as that of RAID-5.
Resilvering
The process of transerring data from one device to another
device. For example, when a mirror component is taken offline and then later is
put back online, the data from the up to date mirror component us copied to the
newly restored mirror component. The process is called mirror resynchronization
in traditional volume management products.
Shared FS
The set of the file systems that are shared between the alternate
boot environment and the primary boot environment. This set includes file
systems, such as .export & the area reserved for swap. Shared FS migh also
contain zone roots.
Snapshot
A read-only image of a FS or volume at a give point of time.
Virtual Device A logical device in a pool, which can be a physical device, a file or
a collection of device.
Volume
A dataset used to emulate a physical device, For example you
can create a ZFS volume as a swap device.
ZFS RAID Configurations:
========================
ZFS support 3 RAID configurations as given below
1.) RAID-0 : Data distributed across one or more disks. There is no redundancy in
RAID-0. If any disk fails, all data will be lost. That the reason RAID-0 is least
preferred.
2.) RAID-1 : The two exact copies of data will be retained in the server. There will
be no data loss unless & until one mirror survives. This is the most commmonly
used RAID in any Volume Manager.
3.) RAID-Z : RAID-Z is similar to RAID-5.
Creation Of Basic ZFS:
======================
I am creating a Pool consisting of one disk c1t0d0 named as mypool. Its a
RAID-0 pool.
Note: You can use -f option in case of any errors while zpool creation.
# zpool create mypool c1t0d0
# df -h mypool
Filesystem size used avail capacity Mounted on
mypool 67G 21K 67G 1% /mypool
This output is showing a pool named as mypool and ZFS filesystem /mypool
which can be used to store data. ZFS its create this directory itself.
We can check the space availabity by using zpool list command as shown below.
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
mypool 68G 124K 68.0G 0% ONLINE
Any errors in the Zpool can be check by zpool status command as show below
errors: No known data errors:
# zpool status
pool: mypool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
errors: No known data errors
Now I will create a new ZFS filesystem using same Zpool. As zpool list is
showng that the zpool is having 68GB space you we can use the free space to
create new FS.
# zfs create mypool/myfs
# df -h /mypool/myfs
Filesystem size used avail capacity Mounted on
mypool/myfs 67G 21K 67G 1% /mypool/myfs
This is the way a simple ZFS FS is created. Herein now I will present to change
some of the FS properties which are mostly used in ZFS.
# I will create a new ZFS FS named as mypool/myquotefs and set the FS quota to
20GB. This will prevent the other FS in the pool from using all the space in the
pool.
# zfs create mypool/myquotefs
# zfs set quota=20g mypool/myquotefs
# df -h /mypool/myquotefs
Filesystem size used avail capacity Mounted on
mypool/myquotefs 20G 21K 20G 1% /mypool/myquotefs
Here I will change the FS mountpoint to desired name by chainging the ZFS
property.
# zfs set mountpoint=/test mypool/myfs
# df -h /test
Filesystem size used avail capacity Mounted on
mypool/myfs 67G 21K 67G 1% /test
#
Zfs list will list all the active ZFS FS and volumes on the server:
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 140K 66.9G 23K /mypool
mypool/myfs 21K 66.9G 21K /test
mypool/myquotefs 21K 20.0G 21K /mypool/myquotefs
-r option is used to list recursively the datasets in the zpool followed by the
pool name as shown below:
# zfs list -r mypool/myfs
NAME USED AVAIL REFER MOUNTPOINT
mypool/myfs 21K 66.9G 21K /test
#
# zpool status
pool: mypool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
errors: No known data errors
Note: To check all the properties you can use zfs get all, I will show the output of
the same below in the post.
Renaming a ZFS FS:
We can rename a ZFS FS using zfs rename command. Below is the syntax for
the same:
# zfs rename
Eg: zfs rename /mypool/myquotefs /mypool/new-myquotefs
Mirroring in ZFS:
=================
I will be going to present the mirroing on ZFS FS and the operation like taking
disk out of ZFS pool and insertion on disk in the pool. You will clearly notice the
status while insertion and taking the disk out. Also you will see how to check the
ZFS parameters and how to change them using zfs set command. I will also try
to show you how to destroy the ZFS FS and ZFS pool. This will give you the basic
platform on how to get speedup with the ZFS FS which is going to be the Primary
FS in Solaris11.
Note: You will also find the procedure to change disk under ZFS in the below
given post, You need to take the disk offline from the respective pool and replace
the disk with newer one and again take the replaced disk online for the
respective pool and monitor untill it completly synced.
# zpool attach -f mypool c1t0d0 c1t2d0
Note: Both disks should be of same size or the new disk which we are going to
mirror is of more size than that of existing one.
# zpool status
pool: mypool
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 22:56:23 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0 143K resilvered
errors: No known data errors
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
mypool 68G 156K 68.0G 0% ONLINE
# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
-
mypool 156K 68.0G 0 0 85 489
# zpool status -x
all pools are healthy
# zpool status
pool: mypool
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 22:56:23 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0 143K resilvered
errors: No known data errors
# zpool detach mypool c1t0d0
# zpool status
pool: mypool
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 22:56:23 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0 143K resilvered
errors: No known data errors
#
# zpool attach -f mypool c1t2d0 c1t0d0
# zpool status
pool: mypool
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 23:02:54 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0 154K resilvered
errors: No known data errors
#
# zpool offline mypool c1t2d0
# zpool status
pool: mypool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online or replace the device with
zpool replace.
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 23:02:54 2011
config:
NAME STATE READ WRITE CKSUM
mypool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c1t2d0 OFFLINE 0 0 0
c1t0d0 ONLINE 0 0 0 154K resilvered
errors: No known data errors
# zpool online mypool c1t2d0
# zpool status
pool: mypool
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sun Oct 9 23:03:57 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0 24K resilvered
c1t0d0 ONLINE 0 0 0
errors: No known data errors
#
# zpool history mypool
History for mypool:
2011-10-09.22:47:43 zpool create mypool c1t0d0
2011-10-09.22:49:15 zfs create mypool/myfs
2011-10-09.22:51:49 zfs create mypool/myquotefs
2011-10-09.22:52:01 zfs set quota=20g mypool/myquotefs
2011-10-09.22:53:30 zfs set mountpoint=/test mypool/myfs
2011-10-09.22:56:24 zpool attach -f mypool c1t0d0 c1t2d0
2011-10-09.23:01:49 zpool detach mypool c1t0d0
2011-10-09.23:02:54 zpool attach -f mypool c1t2d0 c1t0d0
2011-10-09.23:03:25 zpool offline mypool c1t2d0
2011-10-09.23:03:57 zpool online mypool c1t2d0
# zpool history -l mypool
History for mypool:
2011-10-09.22:47:43 zpool create mypool c1t0d0 [user root on yogeshtest#:global]
2011-10-09.22:49:15 zfs create mypool/myfs [user root on yogesh-test#:global]
2011-10-09.22:51:49 zfs create mypool/myquotefs [user root on yogeshtest#:global]
2011-10-09.22:52:01 zfs set quota=20g mypool/myquotefs [user root on yogeshtest#:global]
2011-10-09.22:53:30 zfs set mountpoint=/test mypool/myfs [user root on
yogesh-test#:global]
2011-10-09.22:56:24 zpool attach -f mypool c1t0d0 c1t2d0 [user root on yogeshtest#:global]
2011-10-09.23:01:49 zpool detach mypool c1t0d0 [user root on yogeshtest#:global]
2011-10-09.23:02:54 zpool attach -f mypool c1t2d0 c1t0d0 [user root on yogeshtest#:global]
2011-10-09.23:03:25 zpool offline mypool c1t2d0 [user root on yogeshtest#:global]
2011-10-09.23:03:57 zpool online mypool c1t2d0 [user root on yogeshtest#:global]
#
# zpool status
pool: mypool
state: ONLINE
scrub: scrub completed after 0h0m with 0 errors on Sun Oct 9 23:06:37 2011
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
errors: No known data errors
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 155K 66.9G 23K /mypool
mypool/myfs 21K 66.9G 21K /test
mypool/myquotefs 21K 20.0G 21K /mypool/myquotefs
# zfs get all mypool
NAME PROPERTY VALUE SOURCE
mypool type filesystem
mypool creation Sun Oct 9 22:47 2011
mypool used 155K
mypool available 66.9G
mypool referenced 23K
mypool compressratio 1.00x
mypool mounted yes
mypool quota none default
mypool reservation none default
mypool recordsize 128K default
mypool mountpoint /mypool default
mypool sharenfs off default
mypool checksum on default
mypool compression off default
mypool atime on default
mypool devices on default
mypool exec on default
mypool setuid on default
mypool readonly off default
mypool zoned off default
mypool snapdir hidden default
mypool aclmode groupmask default
mypool aclinherit restricted default
mypool canmount on default
mypool shareiscsi off default
mypool xattr on default
mypool copies 1 default
mypool version 4
mypool utf8only off
mypool normalization none
mypool casesensitivity sensitive
mypool vscan off default
mypool nbmand off default
mypool sharesmb off default
mypool refquota none default
mypool refreservation none default
mypool primarycache all default
mypool secondarycache all default
mypool usedbysnapshots 0
mypool usedbydataset 23K
mypool usedbychildren 132K
mypool usedbyrefreservation 0
mypool logbias latency default
Note: I am disabling the ZFS inherit FS feature which means we wont be able to
see the FS which are mounted automatically by ZFS. Below is the eg. given:
Note: I will suggest you to go through the ZFS FS property index table which you
will be able to find easily from google or sun site. That will give you more idae
about all the parameters which can be changed and there effect can be noticed.
# zfs set mountpoint=none mypool
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 164K 66.9G 23K none
mypool/myfs 21K 66.9G 21K /test
mypool/myquotefs 21K 20.0G 21K none
# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/bootdg/rootvol
8262869 5102426 3077815 63% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 23798088 1624 23796464 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
8262869 5102426 3077815 63% /platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
8262869 5102426 3077815 63% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
fd 0 0 0 0% /dev/fd
swap 23796480 16 23796464 1% /tmp
swap 23796528 64 23796464 1% /var/run
swap 23796464 0 23796464 0% /dev/vx/dmp
swap 23796464 0 23796464 0% /dev/vx/rdmp
/dev/vx/dsk/bootdg/var_crash
20971520 71784 19593510 1% /var/crash
mypool/myfs 70189056 21 70188892 1% /test
# zfs inherit mountpoint mypool
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 164K 66.9G 23K /mypool
mypool/myfs 21K 66.9G 21K /test
mypool/myquotefs 21K 20.0G 21K /mypool/myquotefs
# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/bootdg/rootvol
8262869 5102427 3077814 63% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 23796840 1624 23795216 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
8262869 5102427 3077814 63% /platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
8262869 5102427 3077814 63% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
fd 0 0 0 0% /dev/fd
swap 23795232 16 23795216 1% /tmp
swap 23795280 64 23795216 1% /var/run
swap 23795216 0 23795216 0% /dev/vx/dmp
swap 23795216 0 23795216 0% /dev/vx/rdmp
/dev/vx/dsk/bootdg/var_crash
20971520 71784 19593510 1% /var/crash
mypool/myfs 70189056 21 70188892 1% /test
mypool 70189056 23 70188892 1% /mypool
mypool/myquotefs 20971520 21 20971499 1% /mypool/myquotefs
#
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 164K 66.9G 23K /mypool
mypool/myfs 21K 66.9G 21K /test
mypool/myquotefs 21K 20.0G 21K /mypool/myquotefs
# zfs destroy mypool/myquotefs
# zfs destroy mypool/myfs
# zpool destroy mypool
#
#
# zfs list
no datasets available
# zpool list
no pools available
#
I hope this post will help the begineers to get some speed with ZFS. I will try to
cover the complex ZFS tasks in my coming posts. This is just an overview and I
think is useful for many SOlaris Administrators. :-)
In our earlier post ( To get Started with ZFS ) , yogesh discussed about various
ZFS pool and file system Operations. In this post I will be demonstrating the
redundancy capability for different ZFS pools and also the recovery procedure
from the disk failure scenarios. I have performed this lab on Solaris 11 , these
instructions are same for Solaris 10 though.
Quick Recap about ZFS Pools
1. Simple and Striped Pool ( Equivalent to Raid-0 and Data is Non redundant)
2. Mirrored Pool ( Equivalent to Raid-1)
3. Raidz pool ( Equivalent to Single Parity Raid 5 Can with stand upto single
disk failure)
4. Raidz-2 pool ( Equivalent to Dual Parity Raid 5 Can withstand upto two disk
failures)
5. Raidz-3 pool ( Equivalent to Triple Partity Raid 5 Can with stand upto thre
disk Failures)
RAIDZ Configuration Requirements and Recommendations
A RAIDZ configuration with N disks of size X with P parity disks can hold
approximately (N-P)*X bytes and can withstand P device(s) failing before data
integrity is compromised.
Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
(N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
The recommended number of disks per group is between 3 and 9. If you have
more disks, use multiple groups
A general consideration is whether your goal is to maximum disk space or
maximum performance.
A RAIDZ configuration maximizes disk space and generally performs well when
data is written and read in large chunks (128K or more).
A RAIDZ-2 configuration offers better data availability, and performs similarly to
RAIDZ. RAIDZ-2 has significantly better mean time to data loss (MTTDL) than
either RAIDZ or 2-way mirrors.
A RAIDZ-3 configuration maximizes disk space and offers excellent availability
because it can withstand 3 disk failures.
A mirrored configuration consumes more disk space but generally performs
better with small random reads.
Disk Failure Scenario for Simple/Striped ZFS Non Redundant Pool
Disk Configuration:
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
3. c3t5d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@5,0
4. c3t6d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@6,0
Creating Simple ZFS Storage Pool
root@gurkulunix3:/dev/chassis# zpool create poolnr c3t2d0 c3t3d0
poolnr successfully created, but with no redundancy; failure of one
device will cause loss of the pool
root@gurkulunix3:/dev/chassis# zpool list
NAME
SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
poolnr 3.97G 92.5K 3.97G 0% 1.00x ONLINE
rpool 63.5G 5.21G 58.3G 8% 1.00x ONLINE
Creating Sample Filesystem for new pool
root@gurkulunix3:/dev/chassis# zfs create poolnr/testfs
root@gurkulunix3:/downloads# zpool status poolnr
pool: poolnr
state: ONLINE
scan: none requested
config:
NAME
STATE
poolnr
ONLINE
READ WRITE CKSUM
0
c3t2d0 ONLINE
c3t3d0 ONLINE
errors: No known data errors
After Manual Simulation of the Disk ( c3t2d0) failure:
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <drive type unknown>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
3. c3t5d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@5,0
4. c3t6d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@6,0
root@gurkulunix3:~# zpool status poolnr
pool: poolnr
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors. There
are insufficient replicas for the pool to
continue functioning.
action: Destroy and re-create the pool from a backup source. Manually marking
the device
repaired using zpool clear may allow some data to be recovered.
scan: none requested
config:
NAME
STATE
poolnr
UNAVAIL
READ WRITE CKSUM
0
0 insufficient replicas
c3t2d0 FAULTED
0 too many errors
c3t6d0 ONLINE
From The Above Scenario it has been observed that Simple ZFS pool cannot
withstand for any disk failures.
Disk Failure Scenario for Mirror Pool
Initial Disk Configuration
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
Create Mirror Pool
root@gurkulunix3:~# zpool create mpool mirror c3t4d0 c3t7d0
root@gurkulunix3:~# zfs create mpool/mtestfs
>>> Copy Some Sample data to new file system
root@gurkulunix3:~# df -h|grep /mpool/mtestfs
mpool
2.0G
mpool/mtestfs
32K
2.0G
2.0G
31K
1%
2.0G
1%
root@gurkulunix3:~# zpool status mpool
pool: mpool
state: ONLINE
scan: none requested
config:
NAME
STATE
mpool
ONLINE
mirror-0 ONLINE
c3t4d0 ONLINE
c3t7d0 ONLINE
READ WRITE CKSUM
errors: No known data errors
/mpool
/mpool/mtestfs
After Manually simulating the Disk Failure
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
Specify disk (enter its number): Specify disk (enter its number):
<== we lost the disk c3t7d0
Checking pool Status after Disk Failure
root@gurkulunix3:~# zpool status mpool
pool: mpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using zpool online.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: none requested
config:
NAME
STATE
READ WRITE CKSUM
mpool
DEGRADED
mirror-0 DEGRADED
c3t4d0 ONLINE
c3t7d0 UNAVAIL
0 cannot open
errors: No known data errors
After physically Replacing the Failed disk ( placing new disk in same location)
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 1022 alt 2 hd 128 sec 32>
/pci@0,0/pci8086,2829@d/disk@7,0 << New Disk
>>> Label new disk with SMI Label ( A requirement to attach to ZFS pool)
root@gurkulunix3:~# format -L vtoc -d c3t7d0
Searching for disksdone
selecting c3t7d0
[disk formatted]
c3t7d0 is labeled with VTOC successfully.
Replace the Failed Disk Component from the ZFS pool
root@gurkulunix3:~# zpool replace mpool c3t7d0
root@gurkulunix3:~# zpool status -x mpool
pool mpool is healthy
root@gurkulunix3:~# zpool status mpool
pool: mpool
state: ONLINE
scan: resilvered 210M in 0h0m with 0 errors on Sun Sep 16 10:41:21 2012
config:
NAME
STATE
READ WRITE CKSUM
mpool
ONLINE
mirror-0 ONLINE
c3t4d0 ONLINE
c3t7d0 ONLINE
<<< Disk Online
errors: No known data errors
root@gurkulunix3:~#
Single and Double Disk Failure Scenarios for ZFS Raid-Z Pool
Disk Configuration Available for new Raid-Z pool Creation
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
Creating New RaidZ Pool
root@gurkulunix3:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
invalid vdev specification
use -f to override the following errors:
/dev/dsk/c3t2d0s0 is part of exported or potentially active ZFS pool poolnr.
Please see zpool(1M).
==> Here we had a issue with one of the disk we selected for the pool, and
the reason is the disk already used by some zpool earlier. But now the old zpool
no longer available, and we want to reuse the disk for the new zpool.
==> We can solve the problem by two ways 1. Use -f option to
override the configuration 2. Reinitialize the partition table for the disk ( Solaris
X86 only).
==> In this example I have reinitialized the whole disk as solaris
partition with below command
root@gurkulunix3:~# fdisk -B /dev/rdsk/c3t3d0p0
root@gurkulunix3:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
root@gurkulunix3:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: none requested
config:
NAME
STATE
rzpool
ONLINE
READ WRITE CKSUM
0
raidz1-0 ONLINE
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 ONLINE
c3t7d0 ONLINE
errors: No known data errors
Create File system and Copy some test data to rzpool/r5testfs
root@gurkulunix3:~# zfs create rzpool/r5testfs
root@gurkulunix3:/downloads# df -h|grep test
rzpool/r5testfs
5.8G 575M
5.3G
10%
/rzpool/r5testfs
root@gurkulunix3:/downloads# cd /rzpool/r5testfs/
root@gurkulunix3:/rzpool/r5testfs# ls -l
total 1176598
-rw-rr 1 root
root
602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip
root@gurkulunix3:/rzpool/r5testfs#
After Manual Simulation of the Disk failure ( i.e. c3t7d0)
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
<<== c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):
Checking the zpool Status it is in Degraded State
root@gurkulunix3:~# zpool status -x rzpool
pool: rzpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using zpool online.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: none requested
config:
NAME
rzpool
STATE
READ WRITE CKSUM
DEGRADED
raidz1-0 DEGRADED
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 ONLINE
c3t7d0 UNAVAIL
0 cannot open
errors: No known data errors
Checking if the File system is Still Accessible
root@gurkulunix3:~# df -h |grep testfs
rzpool/r5testfs
5.8G 575M
5.3G
10%
/rzpool/r5testfs
root@gurkulunix3:~# cd /rzpool/r5testfs
root@gurkulunix3:/rzpool/r5testfs# ls -l
total 1176598
-rw-rr 1 root
root
602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip
root@gurkulunix3:/rzpool/r5testfs#
After replacing the failed disk with new disk, in the same location
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0
3. c3t4d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@4,0
4. c3t7d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
root@gurkulunix3:~# zpool status -x
pool: rzpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using zpool replace.
see: http://www.sun.com/msg/ZFS-8000-4J
scan: none requested
config:
NAME
rzpool
STATE
READ WRITE CKSUM
DEGRADED
raidz1-0 DEGRADED
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 ONLINE
c3t7d0 FAULTED
0
0
0 corrupted data
<<== this State
Changed to Faulted just because the zpool could see the new disk but with
no/corrupted data
errors: No known data errors
Replacing the Failed Disk Component in the Zpool
root@gurkulunix3:~# zpool replace rzpool c3t7d0
invalid vdev specification
use -f to override the following errors:
/dev/dsk/c3t7d0s0 is part of exported or potentially active ZFS pool mpool.
Please see zpool(1M).
root@gurkulunix3:~# zpool replace -f rzpool c3t7d0 <<== using -f option to
override above message
root@gurkulunix3:~# zpool status -x
all pools are healthy
root@gurkulunix3:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
NAME
STATE
rzpool
ONLINE
READ WRITE CKSUM
0
raidz1-0 ONLINE
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 ONLINE
c3t7d0 ONLINE
errors: No known data errors
Two Disk Failures Scenario for RaidZ pool And it Fails
Zpool Status Before Disk Failure
root@gurkulunix3:~# zpool status rzpool
pool: rzpool
state: ONLINE
scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
NAME
STATE
rzpool
ONLINE
READ WRITE CKSUM
0
raidz1-0 ONLINE
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 ONLINE
c3t7d0 ONLINE
Disk Configuration After Simulating double disk failure
root@gurkulunix3:~# echo|format
Searching for disksdone
AVAILABLE DISK SELECTIONS:
0. c3t0d0 <SUN ZFS 7120 HARDDISK-1.0 cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c3t2d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@2,0
2. c3t3d0 <SUN ZFS 7120 HARDDISK-1.0-2.00GB>
/pci@0,0/pci8086,2829@d/disk@3,0 <== C3t4d0 & c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):
Zpool Status after the Double Disk Failure
root@gurkulunix3:~# zpool status -x
pool: rzpool
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using zpool online.
see: http://www.sun.com/msg/ZFS-8000-3C
scan: none requested
config:
NAME
STATE
rzpool
UNAVAIL
raidz1-0 UNAVAIL
READ WRITE CKSUM
0
0
0
0
0 insufficient replicas
0 insufficient replicas
c3t2d0 ONLINE
c3t3d0 ONLINE
c3t4d0 UNAVAIL
0 cannot open
c3t7d0 UNAVAIL
0 cannot open
Conclusion: /rzpool/r5testfs filesystem not available for usage and the Zpool
cannot be recovered from the current state
Its too long here to post the RaidZ2 and RaidZ3 disk failure scenarios, I will be
posting them as a separate post.
The ZFS file system is a new kind of file system that fundamentally changes the
way file systems are administered, with the below mentioned features:
ZFS Pooled Storage
ZFS uses the concept of storage pools to manage physical storage. Historically,
file systems were constructed on top of a single physical device. To address
multiple devices and provide for data redundancy, the concept of a volume
managerwas introduced to provide a representation of a single device so that file
systems would not need to be modified to take advantage of multiple devices.
This design added another layer of complexity and ultimately prevented certain
file system advances because the file system had no control over the physical
placement of data on the virtualized volumes.
ZFS eliminates volume management altogether. Instead of forcing you to create
virtualized volumes, ZFS aggregates devices into a storage pool
Transactional Semantics
ZFS is a transactional file system, which means that the file system state is
always consistent on disk. In Transactional file system data is managed using
copy on write semantics. Data is never overwritten, and any sequence of
operations is either entirely committed or entirely ignored. Thus, the file system
can never be corrupted through accidental loss of power or a system crash.
Although the most recently written pieces of data might be lost, the file system
itself will always be consistent. In addition, synchronous data (written using the
O_DSYNC flag) is always guaranteed to be written before returning, so it is never
lost.
Checksums and Self-Healing Data
With ZFS, all data and metadata is verified using a user-selectable checksum
algorithm. In addition, ZFS provides for self-healing data. ZFS supports storage
pools with varying levels of data redundancy. When a bad data block is detected,
ZFS fetches the correct data from another redundant copy and repairs the bad
data, replacing it with the correct data.
Unparalleled Scalability
Zfs is 128 bit filesystem, that allows 256 quadrillion zettabytes of storage.All
metadata is allocated dynamically, so no need exists to preallocate inodes or
otherwise limit the scalability of the file system when it is first created. All the
algorithms have been written with scalability in mind. Directories can have up to
248 (256 trillion) entries, and no limit exists on the number of file systems or the
number of files that can be contained within a file system.
ZFS Snapshots
A snapshot is a read-only copy of a file system or volume. Snapshots can be
created quickly and easily. Initially, snapshots consume no additional disk space
within the pool.
As data within the active dataset changes, the snapshot consumes disk space by
continuing to reference the old data. As a result, the snapshot prevents the data
from being freed back to the pool.
Below is the Quick Reference for ZFS command line Operations
CREATE / DESTROY POOL
Remove a disk from a pool
#zpool detach prod c0t0d0
Delete a pool and all associated filesystems
#zpool destroy prod
Create a pool named prod
#zpool create prod c0t0d0
Create a pool with a different default mount point
#zpool create -m /app/db prod c0t0d0
CREATE RAID-Z / MIRROR
Create RAID-Z vdev / pool
#zpool create raid-pool-1 raidz c3t0d0 c3t1d0 c3t2d0
Add RAID-Z vdev to pool raid-pool-1
#zpool add raid-pool-1 raidz c4t0d0 c4t1d0 c4t2d0
create a RAID-Z1 Storage Pool
#zpool create raid-pool-1 raidz1 c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0
create a RAID-Z2 Storage Pool
#zpool create raid-pool-1 raidz2 c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0
Add a new mirrored vdev to a pool
#zpool add prod mirror c3t0d0 c3t1d0
Force the creation of a mirror and concat
#zpool create -f prod c3t0d0 mirror c4t1d0 c5t2d0
Force the creation of a mirror between two different sized disks
#zpool create -f mypool mirror c2t0d0 c4t0d0
diska is mirrored to diskb
#zpool create mypool mirror diska diskb
diska is mirrored to diskb AND diskc is mirrored to diskd
#zpool create mypool mirror diska diskb mirror diskc diskd
CREATE / DESTROY A FILESYSTEM AND/OR A BLOCKDEVICE
Create a filesystem named db in pool prod
#zfs create prod/db
Create a 5gb block device volume named db in pool prod
#zfs create -V 5gb prod/db
Destroy the filesystem or block device db and associated snapshot(s)
#zfs destroy -fr prod/db
Destroy all datasets in pool prod
#zfs destroy -r prod
MOUNT / UMOUNT zfs
Set the FS mount point to /app/db
#zfs set mountpoint=/app/db prod/db
Mount #zfs oracle in pool prod
#zfs mount prod/db
Mount all #zfs filesystems
#zfs mount -a
Unmounting all #zfs filesystems
#zfs umount a
Unmount #zfs filesystem prod/db
#zfs umount prod/db
LIST ZFS FILESYSTEM INFORMATION
List all zfs filesystems
#zfs list
Listing all properties and settings for a FS
#zfs list -o all
#zfs get all mypool
LIST ZFS POOL INFORMATION
List pool status
# zpool status -x
List individual pool status mypool in detail
# zpool status -v mypool
Listing storage pools brief
# zpool list
Listing name and size
# zpool list -o name,size
Listing without headers / columns
# zpool list -Ho name
SET ZFS FILESYSTEM PROPERTIES
Set a quota on the disk space available to user guest22
#zfs set quota=10G mypool/home/guest22
How to set aside a specific amount of space for a filesystem
#zfs set reservation=10G mypool/prod/test
Enable mounting of a filesystem only through /etc/vfstab
# zfs set mountpoint=legacy mypool/db
and then Add appropriate entries to /etc/vfstab
NFS share /prod/export/share
# zfs set sharenfs=on prod/export/share
Disable execution of files on /prod/export
# zfs set exec=off prod/export
Set the recordsize to 8k
# zfs set recordsize=8k prod/db
Do not update the file access time record
#zfs set atime=off prod/db/datafiles
Enable data compression
#zfs set compression=on prod/db
Enable fletcher4 type checksum
# zfs set checksum=fletcher4 prod/data
Remove the .snapshot directory visibility from the filesystem
# zfs set snapdir=hidden prod/data
ANALYSE ZFS PERFORMANCE
Display zfs IO statistics every 2 seconds
#zpool iostat 2
Display #zfs IO statistics in detail every 2 seconds
#zpool iostat -v 2
zfs FILESYSTEM MAINTENANCE
Scrub all filesystems in pool mypool
# zpool scrub mypool
Temporarily offline a disk (until the next reboot)
#zpool offline -t mypool c0t0d0
Clear error count by onlining a disk
#zpool online
Clear error count (without the need to online a disk)
#zpool clear
IMPORT / EXPORT POOLS AND FILESYSTEMS
List pools available for import
#zpool import
Imports all pools found in the search directories
#zpool import -access
To search for pools with block devices not located in /dev/dsk
#zpool import -d
Search for a pool with block devices created in /zfs
#zpool import -d /zfs prod/data
Import a pool originally named mypool under new name temp
#zpool import mypool temp
Import pool using pool ID
#zpool import 6789123456
Deport a Zfs pool named mypool
#zpool export mypool
Force the unmount and deport of a #zfs pool mypool
#zpool export -f mypool
CREATE / DESTROY SNAPSHOTS
Create a snapshot named test of the db filesystem
#zfs snapshot mypool/db@test
List snapshots
#zfs list -t snapshot
Roll back to Tues (recursively destroy intermediate snaps)
#zfs rollback -r prod/prod@tuesday
Roll back must and force unmount and remount
#zfs rollback -rf prod/prod@tuesday
Destroy snapshot created earlier
#zfs destroy mypool/db@test
CREATE / DESTROY CLONES
Create a snapshot and then clone that snap
#zfs snapshot prod/prod@12-11-06
#zfs clone prod/prod@12-11-06 prod/prod/clone
Destroy clone
#zfs destroy prod/prod/clone