Adding a new LUN on Solaris 9 with VXFS and PowerPath 

3 Comments

There’s not a lot of documentation out there on working with LUNs on servers that use both Veritas Storage Foundation and EMC PowerPath. I have spent the last few weeks perfecting the process of Dynamic LUN Expansion and will be adding more posts about that process in the near future. Most recently I have been working on Solaris 9 servers connected to an EMC Clariion CX500 SAN that have LUNs mounted using the VXFS (Veritas) file system (version 4.1,REV=4.1B18_sol_GA_s10b74L2a) using EMC PowerPath (version 4.5.0_b169). The servers have Qlogic HBAs so I use their utilities to scan the Fibre Channel paths. If you have a different HBA, use that vendor’s tools.

This setup, although not unusual, does make it more complicated. When a new LUN is connected to the host, by default, Veritas will see it first and claim the drive. This is a bad thing, because we want PowerPath to find the drive, setup the pseudo drive and encapsulate the multipathing.

Previous administrators with this setup were unable to get PowerPath to recognize the drive before Veritas without rebooting the server. They would connect the LUN and reboot the server. If Veritas detected the drive first, they would remove the disk from Veritas and reboot until PowerPath claimed the drive. This, of course, to me was unacceptable, so I have decided to document the procedure I follow for properly connecting a new LUN to a host with this configuration. (Note: If you have already got yourself in this situation, you will still need to reboot to cleanup Veritas’ hold on the drive. The alternative to rebooting would be to allocate another LUN for the host and worry about cleaning up the previous one later.)

So how do you get Veritas to not claim the drive when you connect the LUN to the host? Simple, turn off the service that discovers the drive! If you dig into the init scripts for Veritas (/etc/init.d/vxvm-startup2), you’ll see starting on line 193 the following lines:

# Start Event Source Daemon, to enable dynamic device discovery
# Empty the lock file before starting the daemon
> /etc/vx/.vxesd.lock

vxddladm start eventsource

This dynamic device discovery service must be turned off by executing the following command:

# vxddladm stop eventsource

Of course, if you don’t want dynamic discovery service running, you should also comment out line 197 so that it is never started in the first place. I have found that it is necessary to run the stop command multiple times to make sure it is off. Even if I have stopped it on a particular server before, I run it again before connecting any LUNs, just to make sure.

So here are the steps with sample output if any was generated:

Turn off Veritas dynamic discovery (IMPORTANT!!):

# vxddladm stop eventsource

Rescan your fibre connections, optionally verify that you can see the new LUN:

# /opt/QLogic_Corporation/SANsurferCLI/scli -do all rescan
Driver rescan completed on HBA instance 0.
Driver rescan completed on HBA instance 1.

Tell Solaris you’ve made some changes and let it make the proper devices:

# devfsadm -v -C

Configure PowerPath to work with the new LUN:

# /etc/powercf -q
# /etc/powermt config
# /etc/powermt save

Check that your new LUN is properly configured in PowerPath:

# /etc/powermt display dev=all
— cut —
Pseudo name=emcpower6a
CLARiiON ID=APM00012345678 [Test Storage Group]
Logical device ID=60060160A42611000A2B03D123456789 [LUN – TEMP LUN]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A
===============================================
—————- Host ————— – Stor – — I/O Path – — Stats —
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
===============================================
1280 pci@9/fibre-channel@2 c4t1d6s0 SP B0 active alive 0 0
1280 pci@9/fibre-channel@2 c4t2d6s0 SP A0 active alive 0 0
1281 pci@9/fibre-channel@2 c5t1d6s0 SP B1 active alive 0 0
1281 pci@9/fibre-channel@2 c5t2d6s0 SP A1 active alive 0 0
— cut —

Now we need to create a label on the drive, use the PowerPath pseudo name:

# format
Searching for disks…done

c4t1d6: configured with capacity of 349.99GB
c4t2d6: configured with capacity of 349.99GB
c5t1d6: configured with capacity of 349.99GB
c5t2d6: configured with capacity of 349.99GB
emcpower6a: configured with capacity of 349.99GB

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w210000008710a5b2,0
— cut —
40. emcpower6a
/pseudo/emcp@6
Specify disk (enter its number): 40
selecting emcpower6a
[disk formatted]
Disk not labeled. Label it now? yes
format> q

Get rid of the default partitions created by the label command:

# fmthard -d 0:00:00:0:0 /dev/rdsk/emcpower6a
# fmthard -d 1:00:00:0:0 /dev/rdsk/emcpower6a
# fmthard -d 6:00:00:0:0 /dev/rdsk/emcpower6a

Verify that everything looks the way you want it:

# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w210000008710a5b2,0
— cut —
40. emcpower6a
/pseudo/emcp@6
Specify disk (enter its number): 40
selecting emcpower6a
[disk formatted]
format> verify

Primary label contents:

Volume name =
ascii name =
pcyl = 57344
ncyl = 57342
acyl = 2
nhead = 256
nsect = 50
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 – 57341 349.99GB (57342/0/0) 733977600
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0

format> q

If you’ve done everything right, Veritas should not see your new drive yet. Verify:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emcpower0s2 auto:sliced devdg-erp02 devdg-erp online
emcpower1s2 auto:cdsdisk testdg-erp02 testdg-erp online
emcpower2s2 auto:sliced devdg-dw02 devdg-dw online
emcpower3s2 auto:sliced devdg-tle01 devdg-tle online
emcpower4s2 auto:sliced dbadg-erp01 dbadg-erp online

Now that everything is setup right, go ahead and tell Veritas to look for new drives. Important Note: If it is not setup right at this point, Veritas will claim the raw device and it’s a pain to cleanup!

# vxdctl enable

Now, you will see the pseudo drive in the list. If it’s a raw device name, you messed up:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emcpower0s2 auto:sliced devdg-erp02 devdg-erp online
emcpower1s2 auto:cdsdisk testdg-erp02 testdg-erp online
emcpower2s2 auto:sliced devdg-dw02 devdg-dw online
emcpower3s2 auto:sliced devdg-tle01 devdg-tle online
emcpower4s2 auto:sliced dbadg-erp01 dbadg-erp online
emcpower6s2 auto:none – – online invalid

You should now see new devices in /dev/vx/dmp and /dev/vx/rdmp. What’s next? Well, that’s entirely up to you. Now that you’ve got the right drive in Veritas, you can set it up anyway you want – add it to an existing disk group, mirror a disk group for some added redundancy, create a new disk group, etc. In upcoming posts, I’ll detail how to create a new disk group with this LUN and then start digging into dynamically resizing file systems while everything is still online and available to the users.

Random Perl Snippet 

2 Comments

Something that I get to do occasionally at work is review old code to fix some random bug. I actually enjoy doing this because I rarely get to program anymore (since I have never been a developer) and it’s nice to brush up on it again. The other day, I ran into some code I hadn’t seen before, that is valid Perl, but is much more C like:

return ($n < 0) ? 0 : $n;

The biggest problem with trying to figure out what this does if you’ve never seen it before is that you have no idea what to search for. Do you look up “?:”, no, because that actually means something else. Google Code Search has been my friend in the past on things like this, because it can handle all the special characters. I highly recommend trying it out. This time I figured out it was an old C construct, so I just went over and asked a developer what it meant.

So, for those who didn’t know already, this is what it means:

($n < 0) – this is the statement that is being checked (like in an if statement).

The part between the ? and the : is what is returned if the statement is true. The part after the : is what is returned if the statement is false.

So in this case, the author was making sure that the number returned was positive. If $n was less than zero (or negative), it would return zero. If it was zero or some positive number, it just returned that number.

Now, why would you do this? Well, it would all depend on how the variable was derived in the subroutine above it. In my case, the author was using a creative method to find the total number of a specific character in a string (in this case the slash):

my $str = shift(@_);
my $n = split('/', $str) - 1;

Now, although this works, Perl will complain about it because you are clobbering the array @_ and consequently your subroutine arguments. So I ended up changing the code to use the translate command “tr”, which returns the number of matches:

my $n = ($str =~ tr////);

Of course, now the return code is irrelevant, but not causing any harm…

Find open ports in Solaris 9 

1 Comment

In many operating systems, you can find the connection between a specific process and an open port using lsof as I mentioned here. What if you don’t have lsof, like in Solaris 9? Well, if you have pfiles, you can do the same thing by looping through the current processes and checking what port they have open:

for i in `ps -A | grep -v PID | awk '{print $1}'`; do echo PID:$i; pfiles $i | grep port; done

To break it down:

This gives you all the PIDs of running processes:

ps -A | grep -v PID | awk '{print $1}'

Loop through them with the for loop, and then print the PID and the open ports of that PID to the screen:

echo PID:$i; pfiles $i | grep port;

Updated design 

1 Comment

I got tired of the small width columns of the original template that I used to build this blog, so I finally went in and modified the CSS to make the columns bigger. This should make it so that the code is easier to read and copy. According to Google Analytics, 99.7% of my visitors have a wider screen then 800 pixels, so I felt safe expanding it a bit. Let me know if something doesn’t look right with the new layout.

Solaris 8: Bad Superblock at block 16 

No Comments

This weekend I got the joy of rebuilding a Solaris 8 server from scratch with backups. I had another server of the same hardware available to use for the restore, except it had Solaris 9 mirrored across 2 hard drives. Someday, I’ll go through and document the whole process, but basically I broke the mirror, repartitioned the second drive, mounted the new partitions inside of the Solaris 9 environment, restored the data and rebooted off the drive with the restored data.

The interesting thing that I found that I wanted to post about was the corruption of the Solaris 8 superblock that was caused by mounting the partitions within Solaris 9. There were changes made in fsck on Solaris 9 that made the superblock incompatible with Solaris 8. When you boot off of a Solaris 8 partition that has been mounted by Solaris 9, you’ll get the following errors:

The / file system (/dev/rdsk/c0t2d0s0) is being checked.
/dev/rdsk/c0t2d0s0: BAD SUPERBLOCK AT BLOCK 16: BAD VALUES IN SUPER BLOCK
/dev/rdsk/c0t2d0s0: USE AN ALTERNATE SUPERBLOCK TO SUPPLY NEEDED INFORMATION;
/dev/rdsk/c0t2d0s0: e.g. fsck [-F ufs] -o b=# [special …]
/dev/rdsk/c0t2d0s0: where # is the alternate super block. SEE fsck_ufs(1M).

/dev/rdsk/c0t2d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

WARNING – Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/rdsk/c0t2d0s0). Exit the shell when
done to continue the boot process.

Of course, the drive path may be different, etc. The solution to this is simple, run fsck and restore the superblock from one of it’s backups – in this case I restored from block 32:

fsck -F ufs -y -o b=32 /dev/rdsk/c0t2d0s0

Run this on each of the affected partitions (all the partitions that you mounted) and you are good to go.