Enter Proxmox and ZFS

6 minute read

Following up on my Exit Synology post, I’ve decided it’s time to move from a consumer grade NAS to something a bit more sturdy. I’ve also been running out of memory on the Synology NAS with all the things that I wanted to run on it… so time for something else.

Functionalities I wanted to replace:

  • NAS
  • Hypervisor
  • Container runtime
  • VPN endpoint
  • file sync (like dropbox)
  • Cloud backup
  • Backup target for other devices

I went through a few iterations before deciding on my final setup:

First iteration: FreeNAS + bhyve hypervisor

Quoting the FreeNAS website:

FreeNAS is an operating system that can be installed on virtually any hardware platform to share data over a network. FreeNAS is the simplest way to create a centralized and easily accessible place for your data. Use FreeNAS with ZFS to protect, store, and back up all of your data. FreeNAS is used everywhere, for the home, small business, and the enterprise.

This seemed like a good solution for the NAS part of the equation. It’s built on FreeBSD, offers ZFS for storage, a bunch of plugins to extend the functionalities of FreeNAS beyond what the barebones offers… a bit like Synology does.

For the hypervisor part, it’s less ideal.
bhyve, the FreeBSD hypervisor, isn’t quite as optimized yet as many other hypervisors out there, and tests show that there is quite a performance penalty still - this is getting a lot better in FreeBSD 12, but FreeNAS is still based on 11 (right now)

FreeNAS also has no solution for running containers except for running them in a VM. While this isn’t an issue perse, it does require you to actually run another layer of virtualisation in between.

Second iteration: VMWare ESXi + FreeNAS VM

So, to solve the virtualisation issue, there was the halfbaked idea of running VMWare’s ESXi as a hypervisor, run FreeNAS as a VM and share the storage back to ESXi for running VM’s on.

This also introduces somewhat a chicken-and-egg issue - FreeNAS needs to be started before the other VM’s, and other VM’s need to be shutdown before stopping FreeNAS. Chaos, panic and disorder - my work is done!

Problems:

  • ESXi is a proprietary product by VMWare. The free version comes with no support, and you have no guarantee that they won’t discontinue some hardware support.
  • ESXi needs to run on something. A USB thumbdrive works, but if you want some reliability it’s advisable to use some form of hardware RAID. ESXi doesn’t do software RAID.
  • For FreeNAS PCI passthrough of the storage controller is required - which means I do need to have two storage controllers - one for ESXi, one for FreeNAS.
  • Using NFS (or iSCSI) for ESXi datastores means that you need to run those shared filesystems in sync mode.
    Sync mode is slow (it needs to actively sync every change to disk), so you need an additional SLOG device.
    The SLOG device also needs to have power loss protection, because you need to be sure the writes are consistent.
  • The RAM allocated to FreeNAS is ‘lost’ to the rest of the system.
  • ESXi doesn’t do containers. Running a VM with a container daemon is the only solution.

Third iteration: Proxmox and native ZFS

From the Proxmox website:

Proxmox VE is a complete open-source platform for all-inclusive enterprise virtualization that tightly integrates KVM hypervisor and LXC containers, software-defined storage and networking functionality on a single platform, and easily manages high availability clusters and disaster recovery tools with the built-in web management interface.

This is basically Debian, with a hypervisor GUI, running on top of ZFS. It doesn’t offer anything of the other things, but it being Debian, it’s not that hard to make it do all the things that I want it to do.

  • NAS functionality will be done with ZFS and sharing through Samba and NFS.
  • Hypervisor - Proxmox covers that base: it supports both lightweight Linux Containers (LXC), or full fledged VM’s using KVM.
  • For containers there’s the docker daemon.

Final configuration

In the end I settled with Proxmox on top of ZFS, running on a raid-z1 configuration with my WD Red drives.

Hardware

The hardware for the build is:

Software

  • Proxmox (both LXC and VM’s)
  • Docker CE runtime
  • Samba
  • NFS

Proxmox / ZFS config

Proxmox itself is a fairly standard installation. The changes made were:

  • Since I’m using posixacl, I set ZFS xattr property to sa. This will result in the POSIX ACL being stored more efficiently on disk.
    zfs set xattr=sa rpool
    
  • Set the atime proprty to off on the entire datapool. Access time updates cause useless IOPS.
    zfs set atime=off rpool
    
  • Reduce the ZFS kernel module parameter spa_asize_inflation from the default 24 to 6. This impacts the calculations ZFS does to avoid overrunning quota, but causes performance issues when using ‘smallish’ datasets (my LXC containers have 10GB). Discussion on Github issue 10373
    echo 6 > /sys/module/zfs/parameters/spa_asize_inflation
    
  • Set the redundant_metadata property to most on the datasets where it doesn’t matter.

    When set to most, ZFS stores an extra copy of most types of metadata. This can improve performance of random writes, because less metadata must be written. In practice, at worst about 100 blocks (of recordsize bytes each) of user data can be lost if a single on-disk block is corrupt. The exact behavior of which metadata blocks are stored redundantly may change in future releases.

    zfs set redundant_metadata=most rpool/...
    
  • Activate the powersave CPU scaling governor. Saving money is a good thing.
    echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
  • Defining new storage endpoints in /etc/pve/storage.cfg for backups and images
  • Configuring the main bridge vmbr0 to be vlan aware, and defining a new vmbr0.13 interface to get the specific VLAN on the machine running Proxmox.
    auto vmbr0 vmbr0.13
    
    iface vmbr0.13 inet static
          address 192.168.0.2
          netmask 255.255.255.0
          gateway 192.168.0.1
    
    iface vmbr0 inet manual
          bridge_vlan_aware yes
          bridge_stp off
          bridge_fd 0
          bridge_ports enp0s31f6
    
    iface enp0s31f6 inet manual
    

Linux Containers

I’m running a few Linux Containers (LXC), which are hosting

Virtual Machines

At the moment only one VM is running, which hosts my WireGuard endpoint. Reason why this runs in a VM is because it requires kernel modules, and I don’t want to add anything extra to my main Proxmox installation.

Docker configuration

Docker has been configured to use the zfs storage driver, which makes better use of the capabilities of ZFS. Configuring this is as easy as adding

{
    "storage-driver": "zfs"
}

to /etc/docker/daemon.json.

For networking I’m using both a user-defined bridge, which has several advantages over using the default bridge (automatic DNS resolution between containers for one), and macvlan to assign external (from Docker’s point of view) IP’s to docker containers.

The macvlan setup is simple:

docker network create \ 
  --driver macvlan \ 
  --gateway 192.168.0.1 \  
  --subnet 192.168.0.0/24 \ 
  --ip-range 192.168.0.128/25 \ 
  --opt parent=vmbr0.13 \ 
  macvlan-lan

For the actual containers, I’m running a bunch of them:

Comments