Migrating Proxmox/ZFS from raidz1 to mirrors
The Proxmox box at my home is also being used as a NAS, with Samba and NFS doing the sharing. It had 4 WD Red 6TB PMR drives, in a raidz1 configuration, giving me a net capacity of 18TB (give or take a few).
This thing houses backups of other machines in the house and of machines in the cloud, several VM’s, photos in RAW, videos created by my partner for her sidegig, … and it was starting to get full (2TB remaining).
Cleaning up some cruft returned me to 5TB, but still, that was going to decrease overtime. Time to do something about it!
I picked up two WD Red Plus 14TB drives to add to the system, as a mirror. Should you buy these drives, make sure they’re the Plus or Pro variants, as the normal ones now use Shingled Magnetic Recording (SMR), which just does not play nice with ZFS.
As always: make sure you have (working) backups!
When picking raidz1 I did a conversion-ish from the Synology Hybrid Raid (SHR) to ZFS - single disk fault tolerance, net high storage capacity. I also had heard that there would be a thing like raidz expansion in the future, so .. ok.
Fast forward nearly two years: I need to add storage, raidz1 expansion hasn’t landed yet, I find the I/O is slow with VM’s (adding a SLOG made that better), scrubbing takes forever… not ideal.
So, perhaps, conversion to mirrors would be a way to go forward? But how to do this without losing my data?
ZFS makes that really easy with snapshots and zfs send - zfs receive :)
Additionally, moving my Proxmox root off of the data disks also seemed like a Good IdeaTM.
Migrating the boot (EFI) partition
Proxmox boots using EFI on my system. This means that the EFI firmware in the mainboard will go look for an EFI system partition where the bootloader is stored. In the case of Proxmox systemd-boot is used.
I created a new EFI partition on both SLOG SSD’s (512MB), and used the Proxmox proxmox-boot-tool to format and add them to the list of partitions that it needs to keep in sync. This way, whenever a disk dies, you can still boot of another.
# proxmox-boot-tool format /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part2
# proxmox-boot-tool format /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part2
# proxmox-boot-tool init /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part2
# proxmox-boot-tool init /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part2
Profit!
I actually did the same on the two new 14TB drives, so that any drive contains a copy of my bootloader.
Migrating the root filesystem
The SLOG device I picked has a total capacity of 100GB, of at this point 8GB was being used. I opted to create another mirrored zpool on the SSD’s for 30GB called syspool
.
Once the pool was created, it was just a question of creating a snapshot and using zfs send | zfs receive
on the zfs datasets. Ideally also using --props
so zfs send
sends along all properties of the zfs datasets, and -u
so zfs receive
doesn’t automatically mount the new dataset.
The zfs datasets I decided to copy were rpool/ROOT
and rpool/ROOT/pve-1
. Those now live as syspool/ROOT
and syspool/ROOT/pve-1
.
Once that’s done, the final tasks were mounting the new root dataset, making sure that /etc/kernel/cmdline
was updated to reflect the new zpool name and rebooting.
I did run into the problem where zfs didn’t automatically import my new setup, but that was remediated by updating the cache file, which you can do by running zpool set cachefile=/etc/zfs/zpool.cache syspool
.
Migrating the other data
I created a new zpool called datapool
on a mirror of the two new 14TB drives, and used the same zfs send | zfs receive
magic on them to move the data over, slowly emptying the raidz1. I did have to move data from the server to several other machines to be able to fit it all.
If you’re lazy (like a good IT’er), you might like Jim Salter’s sanoid/syncoid tool. This allows for very easy zfs send | zfs receive
-ing, local or remote.
Once the old pool was empty, I destroyed the old zpool (you did check those backups, right?), and added the old drives back as mirrors.
I ended up with this topology:
# zpool status datapool
pool: datapool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
datapool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD140EFGX-68B0GN0_Y5KYTGXC-part2 ONLINE 0 0 0
ata-WDC_WD140EFGX-68B0GN0_Y6G4J06C-part2 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_WD-WX11D86HUYRT-part3 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_WD-WX11DC7JHEKP-part3 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_WD-WX51DB7N60A5-part3 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_WD-WX61D96AX3DV-part3 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part4 ONLINE 0 0 0
ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part4 ONLINE 0 0 0
so there are now four mirrors: three for the datapool, and one for the SLOG.
After extending the pool, it’s a matter of making sure all data lands back on it.
Rebalancing the datapool
Doing things like this had the drawback that while all the space was there, the 2x6TB mirrors were empty and 2x14TB mirror was actually full, not giving me all the benefits of being able to spread out the I/O’s over the six disks.
The trick to fix this lies once again with snapshots and zfs send | zfs receive
. Using this will make zfs read the data and write it back to the pool, spreading the datablocks over all the available disks.
To rebalance, create the snapshot, and send it to a new location on the datapool (using the same parameters as before). Afterwards (and after checking your data) you can destroy the old copy, and use zfs rename to put the dataset in it’s old location.
Final cleaning up
After everything is said and done, don’t forget to clean up any stale snapshots lying around. zfs list datapool -r -t snapshot
is one way to visualize them.
Comments