Postfix fails to shutdown on Linux VM

So I had a problem with a RHEL5 clone (OEL5) Linux VM that would not shutdown postfix correctly on either a powerdown or on a reboot. Here’s the error I saw:

Dec 15 16:43:01 testvm postfix[3160]: fatal: could not find any active network interfaces
Dec 15 16:44:04 testvm postfix/postfix-script: starting the Postfix mail system
Dec 15 16:44:04 testvm postfix/master[1995]: daemon started — version 2.3.3, configuration /etc/postfix

The first trick was realizing that the error was being generated during the shutdown and not during startup. I had checked through my main.cf and verified my /etc/hosts several times thinking that I had some configuration problem within Postfix itself. A virtual machine reboots so fast it’s very easy to miss a reboot within the timestamps.

This VM had the VMware Tools installed and apparently when you install the tools, it sets the priority on shutdown of the tools to be 08. The VMware Tools shutdown the network interfaces and a priority of 08 is way too early to be shutting down the network interfaces. Postfix is set to shutdown at 30 and normal Linux network interfaces are shutdown at 90. So, to fix this issue I simply changed the priority of the VMware Tools shutdown to 89:

mv /etc/rc0.d/K08vmware-tools /etc/rc0.d/K89vmware-tools
mv /etc/rc6.d/K08vmware-tools /etc/rc6.d/K89vmware-tools

I’m sure there is probably some way to do this through chkconfig or something, but sometimes it’s easier to do it the old fashioned way. Here’s my guess on how you would do it through chkconfig:

chkconfig vmware-tools off
vi /etc/init.d/vmware-tools # change chkconfig line from 08 to 89
chkconfig vmware-tools on

Of course, if you reinstall VMware Tools, it will probably replace the symlinks again with the default. Maybe it will even add a second symlink and then you’ll get more errors. I’ll let you know next time I upgrade the tools.

There are several things wrong with the VMware Tools, including it’s automatic installation of hgfs for folder sharing on a server, which I’ll post something about later. I’m still trying to decide if there is any advantage to installing the tools when there is no GUI, especially with it’s poor configuration setups.

Add a hard drive in Linux with LVM

I posted last time about modifying the swap space on Linux with LVM and also introduced expanding a file system to match a new larger partition. Next is to add another drive for data storage. I’m just adding one drive, because the drive is actually a virtual drive through VMware that sits on a RAID5 raid group on our SAN. So redundancy is not really an issue. This example is from a Red Hat Enterprise (RHEL5) clone and is using Logical Volume Management (LVM), but should be usable in other Linux distributions.

Since this is a VM, I simply attached a new blank hard drive through VirtualCenter. Now, of course, I could have rebooted the server to pick up the new drive, but what fun is that? So, the question then became, how do you detect a new hard drive without rebooting in Linux? Currently, it looks like this:

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02

In Solaris, you simply run devfsadm. In Linux, you just need to have the OS rescan the SCSI bus. To do this, we utilize the /proc file system. Determine what the parameters of the new SCSI drive are and the run the following command:

# echo "scsi add-single-device 0 0 1 0" > /proc/scsi/scsi

The four numbers in order are host, channel, id, and LUN. Now, when we check we see the new drive:

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02

At the same time, you will see the following lines in /var/log/messages:

kernel: Vendor: VMware Model: Virtual disk Rev: 1.0
kernel: Type: Direct-Access ANSI SCSI revision: 02
kernel: target0:0:1: Beginning Domain Validation
kernel: target0:0:1: Domain Validation skipping write tests
kernel: target0:0:1: Ending Domain Validation
kernel: target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU RDSTRM RTI WRFLOW PCOMP (6.25 ns, offset 127)
kernel: SCSI device sdb: 104857600 512-byte hdwr sectors (53687 MB)

Since this is a brand new drive, we need to setup any partitions we would like on the drive. If you’re not sure what the name of the drive is, you can always do a fdisk -l first:

# fdisk -l
Disk /dev/sda: 12.8 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 1566 12474472+ 8e Linux LVM

Disk /dev/sdb: 53.6 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

Go ahead and setup the partitions, I just need one:

# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won’t be recoverable.

The number of cylinders for this disk is set to 6527.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-6527, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-6527, default 6527):
Using default value 6527

Command (m for help): p

Disk /dev/sdb: 53.6 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 6527 52428096 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Now you can see the partitions are defined:

# fdisk -l
Disk /dev/sda: 12.8 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 1566 12474472+ 8e Linux LVM

Disk /dev/sdb: 53.6 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 6527 52428096 83 Linux

Now, if you don’t have LVM, you would just make the new file system on the physical drive. (If you are using LVM, don’t do this, skip to the next step.)

# mkfs.ext3 /dev/sdb1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
6553600 inodes, 13107024 blocks
655351 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
400 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Since we have LVM, we will be using that instead. Using lvmscan, we can see that the new partition is available:

# lvmdiskscan
…skip…
/dev/sdb [ 50.00 GB]
/dev/sdb1 [ 50.00 GB]

First step is to turn it into a new physical volume:

# pvcreate /dev/sdb1
Physical volume “/dev/sdb1” successfully created

Now we need a volume group and add the physical volume to it:

# vgcreate VolGroup01 /dev/sdb1
Volume group “VolGroup01” successfully created

And then create the logical volume (using all of the available space on the volume group):

# lvcreate -l 100%FREE -n LogVol00 VolGroup01
Logical volume “LogVol00” created

Now we make the file system on the logical volume:

# mkfs.ext3 /dev/VolGroup01/LogVol00
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
6553600 inodes, 13106176 blocks
655308 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
400 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Add the appropriate line to /etc/fstab:

/dev/VolGroup01/LogVol00 /data ext3 defaults 1 1

Create the mount point and mount the new file system:

# mkdir /data
# mount /data

See that it’s now available to the OS:

# df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup01-LogVol00
50G 180M 47G 1% /data

All done!

Modify swap space in Linux (with LVM)

I posted a while back how to modify your swap space in Solaris 10, now I will show you how to do it in Linux. This example is from a Red Hat Enterprise (RHEL5) clone and is using Logical Volume Management (LVM), but should be usable in other Linux distributions and you could use fdisk to resize the partition instead of using LVM. I will specifically show how to reduce swap space here, put is applicable to enlarging it as well.

To see what file system is being used for your current swap space, use proc:

# cat /proc/swaps
Filename Type Size Used Priority
/dev/mapper/VolGroup00-LogVol01 partition 4194296 0 -1

And then to see how much swap is in use, run the free command:

# free
total used free shared buffers cached
Mem: 385560 76388 309172 0 11328 33788
-/+ buffers/cache: 31272 354288
Swap: 4194296 0 4194296

If your swap space is in use, you will need to reboot into single user mode, shut some applications down until it is free, or add a separate swap drive or file for swap use temporarily. Once your swap space is free to be modified, turn off the swap space that you want to modify:

# swapoff -v /dev/VolGroup00/LogVol01
swapoff on /dev/VolGroup00/LogVol01

You can now verify that the swap space is no longer in use:

# free
total used free shared buffers cached
Mem: 385560 74404 311156 0 11340 33780
-/+ buffers/cache: 29284 356276
Swap: 0 0 0
# cat /proc/swaps

Now that the file system is not is use, we are free to modify it. In this case, I am reducing the 4 GB set aside for swap to 1 GB so that I can reuse it on my root partition. Here’s the layout of my current LVM partitions:

# lvdisplay
— Logical volume —
LV Name /dev/VolGroup00/LogVol00
VG Name VolGroup00
LV UUID jzjdTT-Md9K-iP52-3kv4-OqSL-2Y0c-yxUq7o
LV Write Access read/write
LV Status available
# open 1
LV Size 7.88 GB
Current LE 252
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0

— Logical volume —
LV Name /dev/VolGroup00/LogVol01
VG Name VolGroup00
LV UUID ixEabw-7lMg-ho6h-GcVq-pEOE-rHPb-X3HY3e
LV Write Access read/write
LV Status available
# open 1
LV Size 4.00 GB
Current LE 128
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:1

Now to shrink the swap space partition. The fact that shrinking the partition is destructive doesn’t matter since it is just swap space anyway:

# lvm lvreduce /dev/VolGroup00/LogVol01 -L -3G
WARNING: Reducing active logical volume to 1.00 GB
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce LogVol01? [y/n]: y
Reducing logical volume LogVol01 to 1.00 GB
Logical volume LogVol01 successfully resized

And now to extend the root partition. (You can increase the size while the partition is in use and it is non-destructive.):

# lvm lvextend -l +100%FREE /dev/VolGroup00/LogVol00
Extending logical volume LogVol00 to 10.88 GB
Logical volume LogVol00 successfully resized

Let’s take a look at the new partition sizes:

# lvdisplay
— Logical volume —
LV Name /dev/VolGroup00/LogVol00
VG Name VolGroup00
LV UUID jzjdTT-Md9K-iP52-3kv4-OqSL-2Y0c-yxUq7o
LV Write Access read/write
LV Status available
# open 1
LV Size 10.88 GB
Current LE 348
Segments 2
Allocation inherit
Read ahead sectors 0
Block device 253:0

— Logical volume —
LV Name /dev/VolGroup00/LogVol01
VG Name VolGroup00
LV UUID ixEabw-7lMg-ho6h-GcVq-pEOE-rHPb-X3HY3e
LV Write Access read/write
LV Status available
# open 0
LV Size 1.00 GB
Current LE 32
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:1

Now to remake the smaller partition into usable swap space:

# mkswap /dev/VolGroup00/LogVol01
Setting up swapspace version 1, size = 1073737 kB

And then re-add it back to the OS as usable swap space:

# swapon -va
swapon on /dev/VolGroup00/LogVol01

Verify that the swap space is back:

# cat /proc/swaps
Filename Type Size Used Priority
/dev/mapper/VolGroup00-LogVol01 partition 1048568 0 -2

# free
total used free shared buffers cached
Mem: 385560 76156 309404 0 11668 35036
-/+ buffers/cache: 29452 356108
Swap: 1048568 0 1048568

Let’s go ahead and resize the actual file system on the root partition to take into account the newly available space. First, a before snapshot:

# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
7.7G 764M 6.5G 11% /

Now to increase it. Note, that if you do not specify a new size, it will automatically fill up to the maximum size of the underlying partition:

# resize2fs -p /dev/mapper/VolGroup00-LogVol00
resize2fs 1.39 (29-May-2006)
Filesystem at /dev/mapper/VolGroup00-LogVol00 is mounted on /; on-line resizing required
Performing an on-line resize of /dev/mapper/VolGroup00-LogVol00 to 2850816 (4k) blocks.
The filesystem on /dev/mapper/VolGroup00-LogVol00 is now 2850816 blocks long.

Verify the new size:

# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
11G 766M 9.3G 8% /

Next post I’ll explain how to add a second drive to this system.

Resyncing a failed SVM Miror

So in my last post, I replaced a failed hard drive. What if the drive isn’t bad? Maybe the mirror just got out of sync for some other reason. As I mentioned previously, sometimes metastat, iostat and cfgadm will still show the failed drive. In this case, it is very possible that the hard drive is still functional.

So, here’s how you can analyze the hard drive to verify that it is still good and then resync your mirror that was built through Solaris Volume Manager on Solaris 9. First we check metastat and find the failed mirrors and the bad drive is c1t0d0. The line showing the failed drive at the bottom (in bold) is often missing when the drive is dead. On this failure the drive is still visible and may be reusable.

# metastat
d7: Mirror
Submirror 0: d17
State: Needs maintenance
Submirror 1: d27
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 48049848 blocks (22 GB)

d17: Submirror of d7
State: Needs maintenance
Invoke: metareplace d7 c1t0d0s7 <new device>
Size: 48049848 blocks (22 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s7 0 No Maintenance Yes

d27: Submirror of d7
State: Okay
Size: 48049848 blocks (22 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s7 0 No Okay Yes

d3: Mirror
Submirror 0: d13
State: Needs maintenance
Submirror 1: d23
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8386767 blocks (4.0 GB)

d13: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c1t0d0s3 <new device>
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Maintenance Yes

d23: Submirror of d3
State: Okay
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No Okay Yes

d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6292242 blocks (3.0 GB)

d11: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c1t0d0s1 <new device>
Size: 6292242 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Maintenance Yes

d21: Submirror of d1
State: Okay
Size: 6292242 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Okay Yes

d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8386767 blocks (4.0 GB)

d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0 <new device>
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Maintenance Yes

d20: Submirror of d0
State: Okay
Size: 8386767 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes

Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@SSEAGATE_ST336607LSUN36G_3JA1WZCX00007348J389
c1t0d0 Yes id1,sd@SSEAGATE_ST336607LSUN36G_3JA1B1CW00007337MZU9

And then we check with iostat. We can also see the failed drive in this output and that it has no errors. Often failed drives will contain invalid or incomplete information:

# iostat -En
c1t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 0312A1B1CW
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 0322A1WZCX
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Next we see if it’s available through format and then start testing the drive with a surface analysis:

# format
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@1c,600000/scsi@2/sd@0,0
1. c1t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@1c,600000/scsi@2/sd@1,0
Specify disk (enter its number): 0
selecting c1t0d0
format> analyze
analyze> read
Ready to analyze (won’t harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0
24619/26/53

pass 1
24619/26/53

Total of 0 defective blocks repaired.
analyze> refresh
Ready to analyze (won’t harm data). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0
24619/26/53

pass 1
24619/26/53

Total of 0 defective blocks repaired.
analyze> test
Ready to analyze (won’t harm data). This takes a long time,
but is interruptable with CTRL-C. Continue? y

pass 0 – pattern = 0xc6dec6de
24619/26/53

pass 1 – pattern = 0x6db6db6d
24619/26/53

Total of 0 defective blocks repaired.

Since everything went well on the analysis, let’s go ahead and reuse this drive and resync the mirrors. You’ll notice that I am using the metareplace command, but it’s different then recommended in metastat. You need to use the -e parameter to have it resync the original drive (this can also be used when you have physically replaced the drive as well).

# metareplace -e d0 c1t0d0s0
d0: device c1t0d0s0 is enabled
# metareplace -e d1 c1t0d0s1
d1: device c1t0d0s1 is enabled
# metareplace -e d3 c1t0d0s3
d3: device c1t0d0s3 is enabled
# metareplace -e d7 c1t0d0s7
d7: device c1t0d0s7 is enabled

You can now see that all the mirrors are re-syncing. Of course, you’ll want to keep an eye on this server to see if it fails again.

# metastat | grep -i resync
State: Resyncing
Resync in progress: 0 % done
State: Resyncing
c1t0d0s7 0 No Resyncing Yes
State: Resyncing
Resync in progress: 2 % done
State: Resyncing
c1t0d0s3 0 No Resyncing Yes
State: Resyncing
Resync in progress: 7 % done
State: Resyncing
c1t0d0s1 0 No Resyncing Yes
State: Resyncing
Resync in progress: 21 % done
State: Resyncing
c1t0d0s0 0 No Resyncing Yes

Don’t forget to run installboot if you rebuilt your s0 partition as well as re-adding metadevice database replicas if you removed them due to a reboot. For more info, check my previous post.

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0
# metadb -i
flags first blk block count
a m p luo 16 4096 /dev/dsk/c1t1d0s4
a p luo 4112 4096 /dev/dsk/c1t1d0s4
# metadb -a -c 2 c1t0d0s4
# metadb -i
flags first blk block count
a m p luo 16 4096 /dev/dsk/c1t1d0s4
a p luo 4112 4096 /dev/dsk/c1t1d0s4
a u 16 4096 /dev/dsk/c1t0d0s4
a u 4112 4096 /dev/dsk/c1t0d0s4

Replacing a hard drive with Solaris Volume Manager

Last time I posted about my experience replacing a drive in an array created with Veritas Volume Manager. That was a RAID 0 array that had lost its data immediately when the drive died, so I didn’t worry about saving the data when rebuilding it. This time I was rebuilding a RAID 1 array that was built with Solaris Volume Manager. When I lost this hard drive the data was still intact on the existing functional drive and I wanted to keep it that way. So as far as disclaimers on this post, I kept my data with no problems, but don’t just copy and paste commands without understanding how it will effect your system. Every server is different, so don’t assume yours is setup the same as mine! So this is a Solaris 9 server, with two IDE drives setup into 4 separate mirrors: d0, d1, d3, & d7. Since there are two hard drives, there are two components or submirrors under each mirror. The first drive is always setup as 1x where x is the number from the parent mirror. The second drive is always 2x where x is the number from the parent mirror. So d0 is the parent mirror, d10 is the drive one submirror, and d20 is the drive two submirror. Make sense? Let’s go!

The first problem that occurs when you have two drives and you don’t notice that you lost one, is when you reboot or the system restarts for some reason. The problem is with the metadevice database replicas. Sun recommends that you store at least one metadevice database replica on each of your hard drives. So if you only have two drives, it is very likely you have an even number of replicas. Solaris 9 uses a majority consensus algorithm to determine if there are stale databases and will not boot without one more than half of the total replicas online. Quite obviously, this will cause your system to not boot if you have two hard drives and one dies. Here’s the console output for this situation:

metainit: hostname: stale databases

Insufficient metadevice database replicas located.

Use metadb to delete databases which are broken.
Ignore any “Read-only file system” error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.

Type control-d to proceed with normal startup,
(or give root password for system maintenance):

The solution is to simply remove the bad replicas and reboot. First check to see how the replicas are defined:

# metadb -i
flags first blk block count
M p 16 unknown /dev/dsk/c0t0d0s4
M p 4112 unknown /dev/dsk/c0t0d0s4
a m p lu 16 4096 /dev/dsk/c0t2d0s4
a p l 4112 4096 /dev/dsk/c0t2d0s4

Remove the replicas from the dead drive:

# metadb -d c0t0d0s4
metadb: rembrandt: c0t0d0s4: no metadevice database replica on device

Recheck that they’re gone:

# metadb -i
flags first blk block count
a m p lu 16 4096 /dev/dsk/c0t2d0s4
a p l 4112 4096 /dev/dsk/c0t2d0s4

Once logged out, the system will continue to boot and then reset again before fully booting.

# exit
logout
Resuming system initialization. Metadevice database will remain stale.

Once the system is back up and running, we can go about replacing the bad drive. The first thing you want to do is backup your lvm configs:

# cp -r /etc/lvm /root/lvm.backup

Now take a look at what mirrors/drives you have down. To keep the post shorter, I am just showing one mirror here. The others look the same. You’ll also notice at the end that in my case, the second hard drive isn’t showing up at all, a pretty good sign that the drive is dead. If it’s still listed you might have a chance that it’s still good. (I’ll go over that scenario in my next post.)

# metastat

… <SNIP> …

d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 6295440 blocks (3.0 GB)

d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t0d0s0
Size: 6295440 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Maintenance Yes

d20: Submirror of d0
State: Okay
Size: 6295440 blocks (3.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s0 0 No Okay Yes

Device Relocation Information:
Device Reloc Device ID
c0t2d0 Yes id1,dad@AWDC_WD1200BB-00CAA1=WD-WMA8C2168885

This can also be verified through iostat. If the drive is still in here, it has some useful information like model numbers and serial numbers. If it’s gone, the info is still useful too. You know which drive(s) are good and through the process of elimination, you know which one is bad.

# iostat -En
c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: WDC WD1200BB-00C Revision: 17.07W17 Serial No: WD-WMA8C2168885
Size: 120.03GB <120031641600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0

This can also be verified through format and cfgadm:

# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c0t2d0
/pci@1f,0/ide@d/dad@2,0
Specify disk (enter its number):

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t2d0 disk connected configured unknown
c0::dsk/c0t3d0 CD-ROM connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok

Since, I had all the information I needed, I shutdown the server (since they were IDE drives and not hot swappable) and replaced the faulty drive with another one of the same model, size, etc. Once booted, I went into format, it saw the drive, so I went ahead and cleared the partitions since I had used this drive previously for something else. I’m not going to go through all the format screens, they’re self explanatory. So I went to format > 0 > part > zero out partitions (all but partition 2) > label > quit > quit. Next, I needed to partition the drive to be the same as the drive it was going to mirror. The prtvtoc command makes this easy. Just make sure you type the right drives:

# prtvtoc -h /dev/rdsk/c0t2d0s2 | fmthard -s – /dev/rdsk/c0t0d0s2
fmthard: New volume table of contents now in place.

You can verify that both partition tables match now if you like, compare the output of the two commands:

# prtvtoc -h /dev/rdsk/c0t0d0s2
# prtvtoc -h /dev/rdsk/c0t2d0s2

Now go into your lvm backup and cat out md.cf, copy it to another screen somewhere since you will be referring back to it several times. I have highlighted the lines I used:

# cd /root/lvm.backup/
# cat md.cf
d3 -m d13 d23 1
d13 1 1 c0t0d0s3
d23 1 1 c0t2d0s3
d7 -m d17 d27 1
d17 1 1 c0t0d0s7
d27 1 1 c0t2d0s7
d1 -m d11 d21 1
d11 1 1 c0t0d0s1
d21 1 1 c0t2d0s1
d0 -m d10 d20 1
d10 1 1 c0t0d0s0
d20 1 1 c0t2d0s0

Now we will be going through the process of detaching the failed submirror, clearing , rebuilding, and then reattaching it. Go through this for every submirror that has failed. I will show you the output from the first round then just the commands for the additional rounds. The third command (metainit) just uses the information from the md.cf file:

# metadetach -f d3 d13
d3: submirror d13 is detached
# metaclear d13
d13: Concat/Stripe is cleared
# metainit d13 1 1 c0t0d0s3
d13: Concat/Stripe is setup
# metattach d3 d13
d3: submirror d13 is attached

Now you can check the status of the resync with metastat:

# metastat d3
d3: Mirror
Submirror 0: d13
State: Resyncing
Submirror 1: d23
State: Okay
Resync in progress: 12 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4198320 blocks (2.0 GB)

d13: Submirror of d3
State: Resyncing
Size: 4198320 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s3 0 No Okay Yes

d23: Submirror of d3
State: Okay
Size: 4198320 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s3 0 No Okay Yes

Device Relocation Information:
Device Reloc Device ID
c0t0d0 Yes id1,dad@AWDC_WD1200BB-00CAA1=WD-WMA8C2114374
c0t2d0 Yes id1,dad@AWDC_WD1200BB-00CAA1=WD-WMA8C2168885

Now run through the rest of the commands:

# metadetach -f d7 d17
# metaclear d17
# metainit d17 1 1 c0t0d0s7
# metattach d7 d17

# metadetach -f d1 d11
# metaclear d11
# metainit d11 1 1 c0t0d0s1
# metattach d1 d11

# metadetach -f d0 d10
# metaclear d10
# metainit d10 1 1 c0t0d0s0
# metattach d0 d10

Of course, you can always check at anytime any of the mirrors or submirrors to see what you’re doing. If you want to check the status on multiple rebuilds, just run:

# metastat | grep Resync
State: Resyncing
Resync in progress: 2 % done
State: Resyncing
State: Resyncing
Resync in progress: 99 % done
State: Resyncing
State: Resyncing
Resync in progress: 1 % done
State: Resyncing

Once the drives have finished syncing, we need to make sure that we can still boot off this drive in case the other drive fails:

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0

And, of course, we need to re-add the metadevice database replicas for the same reason:

# metadb -a -c 2 c0t0d0s4
# metadb -i
flags first blk block count
a u 16 4096 /dev/dsk/c0t0d0s4
a u 4112 4096 /dev/dsk/c0t0d0s4
a m p luo 16 4096 /dev/dsk/c0t2d0s4
a p luo 4112 4096 /dev/dsk/c0t2d0s4