Migrating Proxmox/ZFS from raidz1 to mirrors

Posted Jan 15, 2022

By Jan

5 min read

The Proxmox box at my home is also being used as a NAS, with Samba and NFS doing the sharing. It had 4 WD Red 6TB PMR drives, in a raidz1 configuration, giving me a net capacity of 18TB (give or take a few).

This thing houses backups of other machines in the house and of machines in the cloud, several VM’s, photos in RAW, videos created by my partner for her sidegig, … and it was starting to get full (2TB remaining).

Cleaning up some cruft returned me to 5TB, but still, that was going to decrease overtime. Time to do something about it!

I picked up two WD Red Plus 14TB drives to add to the system, as a mirror. Should you buy these drives, make sure they’re the Plus or Pro variants, as the normal ones now use Shingled Magnetic Recording (SMR), which just does not play nice with ZFS.

As always: make sure you have (working) backups!

When picking raidz1 I did a conversion-ish from the Synology Hybrid Raid (SHR) to ZFS - single disk fault tolerance, net high storage capacity. I also had heard that there would be a thing like raidz expansion in the future, so .. ok.

Fast forward nearly two years: I need to add storage, raidz1 expansion hasn’t landed yet, I find the I/O is slow with VM’s (adding a SLOG made that better), scrubbing takes forever… not ideal.

So, perhaps, conversion to mirrors would be a way to go forward? But how to do this without losing my data?

ZFS makes that really easy with snapshots and zfs send - zfs receive :)

Additionally, moving my Proxmox root off of the data disks also seemed like a Good Idea™️.

Migrating the boot (EFI) partition

Proxmox boots using EFI on my system. This means that the EFI firmware in the mainboard will go look for an EFI system partition where the bootloader is stored. In the case of Proxmox systemd-boot is used.

I created a new EFI partition on both SLOG SSD’s (512MB), and used the Proxmox proxmox-boot-tool to format and add them to the list of partitions that it needs to keep in sync. This way, whenever a disk dies, you can still boot of another.

proxmox-boot-tool format /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part2
proxmox-boot-tool format /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part2

proxmox-boot-tool init /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part2
proxmox-boot-tool init /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part2

Profit!

I actually did the same on the two new 14TB drives, so that any drive contains a copy of my bootloader.

Migrating the root filesystem

The SLOG device I picked has a total capacity of 100GB, of at this point 8GB was being used. I opted to create another mirrored zpool on the SSD’s for 30GB called syspool.

Once the pool was created, it was just a question of creating a snapshot and using zfs send | zfs receive on the zfs datasets. Ideally also using --props so zfs send sends along all properties of the zfs datasets, and -u so zfs receive doesn’t automatically mount the new dataset.

The zfs datasets I decided to copy were rpool/ROOT and rpool/ROOT/pve-1. Those now live as syspool/ROOT and syspool/ROOT/pve-1.

Once that’s done, the final tasks were mounting the new root dataset, making sure that /etc/kernel/cmdline was updated to reflect the new zpool name and rebooting.

I did run into the problem where zfs didn’t automatically import my new setup, but that was remediated by updating the cache file, which you can do by running zpool set cachefile=/etc/zfs/zpool.cache syspool.

Migrating the other data

I created a new zpool called datapool on a mirror of the two new 14TB drives, and used the same zfs send | zfs receive magic on them to move the data over, slowly emptying the raidz1. I did have to move data from the server to several other machines to be able to fit it all.

If you’re lazy (like a good IT’er), you might like Jim Salter’s sanoid/syncoid tool. This allows for very easy zfs send | zfs receive-ing, local or remote.

Once the old pool was empty, I destroyed the old zpool (you did check those backups, right?), and added the old drives back as mirrors.

I ended up with this topology:

zpool status datapool

  pool: datapool
 state: ONLINE
config:

        NAME                                                   STATE     READ WRITE CKSUM
        datapool                                               ONLINE       0     0     0
          mirror-0                                             ONLINE       0     0     0
            ata-WDC_WD140EFGX-68B0GN0_Y5KYTGXC-part2           ONLINE       0     0     0
            ata-WDC_WD140EFGX-68B0GN0_Y6G4J06C-part2           ONLINE       0     0     0
          mirror-2                                             ONLINE       0     0     0
            ata-WDC_WD60EFRX-68L0BN1_WD-WX11D86HUYRT-part3     ONLINE       0     0     0
            ata-WDC_WD60EFRX-68L0BN1_WD-WX11DC7JHEKP-part3     ONLINE       0     0     0
          mirror-3                                             ONLINE       0     0     0
            ata-WDC_WD60EFRX-68L0BN1_WD-WX51DB7N60A5-part3     ONLINE       0     0     0
            ata-WDC_WD60EFRX-68L0BN1_WD-WX61D96AX3DV-part3     ONLINE       0     0     0
        logs
          mirror-1                                             ONLINE       0     0     0
            ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part4  ONLINE       0     0     0
            ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part4  ONLINE       0     0     0

so there are now four mirrors: three for the datapool, and one for the SLOG.

After extending the pool, it’s a matter of making sure all data lands back on it.

Rebalancing the datapool

Doing things like this had the drawback that while all the space was there, the 2x6TB mirrors were empty and 2x14TB mirror was actually full, not giving me all the benefits of being able to spread out the I/O’s over the six disks.

The trick to fix this lies once again with snapshots and zfs send | zfs receive. Using this will make zfs read the data and write it back to the pool, spreading the datablocks over all the available disks.

To rebalance, create the snapshot, and send it to a new location on the datapool (using the same parameters as before). Afterwards (and after checking your data) you can destroy the old copy, and use zfs rename to put the dataset in it’s old location.

Final cleaning up

After everything is said and done, don’t forget to clean up any stale snapshots lying around. zfs list datapool -r -t snapshot is one way to visualize them.

Technology & IT, Linux

This post is licensed under CC BY 4.0 by the author.