Resyncing a failed SVM Miror

So in my last post, I replaced a failed hard drive. What if the drive isn’t bad? Maybe the mirror just got out of sync for some other reason. As I mentioned previously, sometimes metastat, iostat and cfgadm will still show the failed drive. In this case, it is very possible that the hard drive is still functional.

So, here’s how you can analyze the hard drive to verify that it is still good and then resync your mirror that was built through Solaris Volume Manager on Solaris 9. First we check metastat and find the failed mirrors and the bad drive is c1t0d0. The line showing the failed drive at the bottom (in bold) is often missing when the drive is dead. On this failure the drive is still visible and may be reusable.

# metastat
d7: Mirror
Submirror 0: d17
State: Needs maintenance
Submirror 1: d27
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 48049848 blocks (22 GB)

d17: Submirror of d7
State: Needs maintenance
Invoke: metareplace d7 c1t0d0s7 <new device>
Size: 48049848 blocks (22 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s7 0 No Maintenance Yes

d27: Submirror of d7
State: Okay
Size: 48049848 blocks (22 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s7 0 No Okay Yes

d3: Mirror
Submirror 0: d13
State: Needs maintenance
Submirror 1: d23
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8386767 blocks (4.0 GB)

d13: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c1t0d0s3 <new device>
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Maintenance Yes

d23: Submirror of d3
State: Okay
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No Okay Yes

d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6292242 blocks (3.0 GB)

d11: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c1t0d0s1 <new device>
Size: 6292242 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Maintenance Yes

d21: Submirror of d1
State: Okay
Size: 6292242 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Okay Yes

d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8386767 blocks (4.0 GB)

d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0 <new device>
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Maintenance Yes

d20: Submirror of d0
State: Okay
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes

Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@SSEAGATE_ST336607LSUN36G_3JA1WZCX00007348J389
c1t0d0 Yes id1,sd@SSEAGATE_ST336607LSUN36G_3JA1B1CW00007337MZU9

And then we check with iostat. We can also see the failed drive in this output and that it has no errors. Often failed drives will contain invalid or incomplete information:

# iostat -En
c1t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 0312A1B1CW
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 0322A1WZCX
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Next we see if it’s available through format and then start testing the drive with a surface analysis:

# format
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@1c,600000/scsi@2/sd@0,0
1. c1t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@1c,600000/scsi@2/sd@1,0
Specify disk (enter its number): 0
selecting c1t0d0
format> analyze
analyze> read
Ready to analyze (won’t harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0
24619/26/53

pass 1
24619/26/53

Total of 0 defective blocks repaired.
analyze> refresh
Ready to analyze (won’t harm data). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0
24619/26/53

pass 1
24619/26/53

Total of 0 defective blocks repaired.
analyze> test
Ready to analyze (won’t harm data). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0 – pattern = 0xc6dec6de
24619/26/53

pass 1 – pattern = 0x6db6db6d
24619/26/53

Total of 0 defective blocks repaired.

Since everything went well on the analysis, let’s go ahead and reuse this drive and resync the mirrors. You’ll notice that I am using the metareplace command, but it’s different then recommended in metastat. You need to use the -e parameter to have it resync the original drive (this can also be used when you have physically replaced the drive as well).

# metareplace -e d0 c1t0d0s0
d0: device c1t0d0s0 is enabled
# metareplace -e d1 c1t0d0s1
d1: device c1t0d0s1 is enabled
# metareplace -e d3 c1t0d0s3
d3: device c1t0d0s3 is enabled
# metareplace -e d7 c1t0d0s7
d7: device c1t0d0s7 is enabled

You can now see that all the mirrors are re-syncing. Of course, you’ll want to keep an eye on this server to see if it fails again.

# metastat | grep -i resync
State: Resyncing
Resync in progress: 0 % done
State: Resyncing
c1t0d0s7 0 No Resyncing Yes
State: Resyncing
Resync in progress: 2 % done
State: Resyncing
c1t0d0s3 0 No Resyncing Yes
State: Resyncing
Resync in progress: 7 % done
State: Resyncing
c1t0d0s1 0 No Resyncing Yes
State: Resyncing
Resync in progress: 21 % done
State: Resyncing
c1t0d0s0 0 No Resyncing Yes

Don’t forget to run installboot if you rebuilt your s0 partition as well as re-adding metadevice database replicas if you removed them due to a reboot. For more info, check my previous post.

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0
# metadb -i
flags first blk block count
a m p luo 16 4096 /dev/dsk/c1t1d0s4
a p luo 4112 4096 /dev/dsk/c1t1d0s4
# metadb -a -c 2 c1t0d0s4
# metadb -i
flags first blk block count
a m p luo 16 4096 /dev/dsk/c1t1d0s4
a p luo 4112 4096 /dev/dsk/c1t1d0s4
a u 16 4096 /dev/dsk/c1t0d0s4
a u 4112 4096 /dev/dsk/c1t0d0s4





Please VOTE for this page at: ADD TO DEL.ICIO.US | ADD TO DIGG | ADD TO FURL | ADD TO NEWSVINE | ADD TO NETSCAPE | ADD TO REDDIT | ADD TO STUMBLEUPON | ADD TO TECHNORATI FAVORITES | ADD TO SQUIDOO | ADD TO WINDOWS LIVE | ADD TO YAHOO MYWEB | ADD TO ASK | ADD TO GOOGLE


1 Comment


  1. You can also add a line in /etc/system that will allow the box to boot with 50% of metadb’s in place.
    set md:mirrored_root_flag=1

    Posted November 3, 2008, 8:38 am

Leave a reply