Ever since my storage system was built there was one thing that annoyed me. The 2.5" hard disk drive that houses the operating system itself was lifted from an old notebook and had the annoying property of parking it's heads after five seconds of inactivity. Since ZFS writes to the disk quite often and regularily this led to a constant cycle of parking and unparking. This was certainly not helping the disks life span, it made an annoying noise and it caused small system hangs whenever the disk had to unpark it's heads to read some data.
Under Linux one could use hdparm
to instruct the disk to not park it's heads, but unfortunately a program mimicking this functionality seems to be absent under Solaris. Thus the plan to replace the disk with a different one which had a more sensible apporoach to head parking.
This turned out to be an interesting endeavour.
The general problem of replacing the disk holding the rpool is common enough that the excellent ZFS troubleshooting guide has a section on doing this. The general plan of action is as follows:
- Insert the replacement disk into an available slot
- Create a partition spanning the whole disk
- Create boot and data slices
- Attach the new disk as a mirror to the rpool
- Wait for the resilver to finish
- Install grub on the new disk
- Try to boot from the new disk
- Detach the old disk from the rpool
- Remove the old disk
This is all very sensible, and it all works as advertised. In my case there is, however, a last step not on the list above:
- Put the new disk on the controller the old disk was attached to
The reason for that is that the case I used only has one internal 2.5" hard disk drive slot. The new disk was prepared using an external USB-IDE converter module. This worked just fine, the BIOS is even able to boot from the USB disk. As long as the new disk remained attached to the USB converter everything was fine, even after the old (internal) disk was removed from the rpool. But putting the new disk into the case caused Solaris to roll over and die early in the boot process due to not finding it's rpool disk. The error message indicated that it was trying to read the pool from the external USB device (which no longer existed at this point).
Investigation (and much swearing) turned up that this information was passed by GRUB to the Solaris kernel.
Solaris uses a patched GRUB version which understands ZFS and has some string replacement magic built in. Every (non failsafe) boot entry contains a line similar to this:
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
$ZFS-BOOTFS
is replaced by GRUB with the following information:
- The name of the root pool (usually rpool) and the number of the dataset that contains the root file system (there may be several BEs)
- The device path of the disk this GRUB instance was read from
The actual command line that is executed by GRUB thus looks something like this:
kernel /platform/i86pc/kernel/$ISADIR/unix -B zfs-bootfs=rpool/328 \
bootpath="/pci@0,0/pci8086,2942@1c,1/pci-ide@0/ide@0/cmdk@0,0:a"
The interesting part here is the bootpath
parameter. This is the device that Solaris will try to mount the rpool from. Even if the rpool consists of several mirror devices, only one is used in the initial boot process. Where does GRUB get the device path from? It's read from the rpool header, from the disk GRUB was loaded from. Every ZFS pool disk contains the device path it was last found under. This usually does not matter much, a RAIDZ will still mount if you swap the disks around when the machine is off, but the boot process relies on the rpool disks not wandering around. My new disk still had the USB device path embedded, which GRUB read and passed to the kernel, which then failed to find the disk.
Fixing this turns out to be easy: boot into failsafe mode with the new disk on it's final connector. This will search for rpools and BEs on the system and offer to mount one of them. Pick the right one, reboot. This is enough to get the current (and correct) device path embedded into the rpool. The next (non failsafe) boot will thus pick up the correct device path and allow the boot to continue.
The morale of an afternoon thus spent in the innards of the Solaris boot process is thus: do not swap your rpool disk around.