Enter Proxmox and ZFS
Following up on my Exit Synology post, I’ve decided it’s time to move from a consumer grade NAS to something a bit more sturdy. I’ve also been running out of memory on the Synology NAS with all the things that I wanted to run on it… so time for something else.
Functionalities I wanted to replace:
- NAS
- Hypervisor
- Container runtime
- VPN endpoint
- file sync (like dropbox)
- Cloud backup
- Backup target for other devices
I went through a few iterations before deciding on my final setup:
First iteration: FreeNAS + bhyve hypervisor
Quoting the FreeNAS website:
FreeNAS is an operating system that can be installed on virtually any hardware platform to share data over a network. FreeNAS is the simplest way to create a centralized and easily accessible place for your data. Use FreeNAS with ZFS to protect, store, and back up all of your data. FreeNAS is used everywhere, for the home, small business, and the enterprise.
This seemed like a good solution for the NAS part of the equation. It’s built on FreeBSD, offers ZFS for storage, a bunch of plugins to extend the functionalities of FreeNAS beyond what the barebones offers… a bit like Synology does.
For the hypervisor part, it’s less ideal.
bhyve, the FreeBSD hypervisor, isn’t quite as optimized yet as many other hypervisors out there,
and tests
show that there is
quite a performance penalty still - this is getting a lot better
in FreeBSD 12, but FreeNAS is still based on 11 (right now)
FreeNAS also has no solution for running containers except for running them in a VM. While this isn’t an issue perse, it does require you to actually run another layer of virtualisation in between.
Second iteration: VMWare ESXi + FreeNAS VM
So, to solve the virtualisation issue, there was the halfbaked idea of running VMWare’s ESXi as a hypervisor, run FreeNAS as a VM and share the storage back to ESXi for running VM’s on.
This also introduces somewhat a chicken-and-egg issue - FreeNAS needs to be started before the other VM’s, and other VM’s need to be shutdown before stopping FreeNAS. Chaos, panic and disorder - my work is done!
Problems:
- ESXi is a proprietary product by VMWare. The free version comes with no support, and you have no guarantee that they won’t discontinue some hardware support.
- ESXi needs to run on something. A USB thumbdrive works, but if you want some reliability it’s advisable to use some form of hardware RAID. ESXi doesn’t do software RAID.
- For FreeNAS PCI passthrough of the storage controller is required - which means I do need to have two storage controllers - one for ESXi, one for FreeNAS.
- Using NFS (or iSCSI) for ESXi datastores means that you need to run those shared filesystems in
sync mode.
Sync mode is slow (it needs to actively sync every change to disk), so you need an additional SLOG device.
The SLOG device also needs to have power loss protection, because you need to be sure the writes are consistent. - The RAM allocated to FreeNAS is ‘lost’ to the rest of the system.
- ESXi doesn’t do containers. Running a VM with a container daemon is the only solution.
Third iteration: Proxmox and native ZFS
From the Proxmox website:
Proxmox VE is a complete open-source platform for all-inclusive enterprise virtualization that tightly integrates KVM hypervisor and LXC containers, software-defined storage and networking functionality on a single platform, and easily manages high availability clusters and disaster recovery tools with the built-in web management interface.
This is basically Debian, with a hypervisor GUI, running on top of ZFS. It doesn’t offer anything of the other things, but it being Debian, it’s not that hard to make it do all the things that I want it to do.
- NAS functionality will be done with ZFS and sharing through Samba and NFS.
- Hypervisor - Proxmox covers that base: it supports both lightweight Linux Containers (LXC), or full fledged VM’s using KVM.
- For containers there’s the docker daemon.
Final configuration
In the end I settled with Proxmox on top of ZFS, running on a raid-z1 configuration with my WD Red drives.
Hardware
The hardware for the build is:
- Sharkoon SilentStorm SFX 500 Gold PSU
- Fractal Node 804 case
- Intel Xeon 1230v5 CPU
- Cooler Master Hyper 212 EVO CPU cooler
- 2 bars of Kingston 16GB ECC DDR4 RAM
- Asrockrack C236 WS mainboard
- 4 WD Red 6TB PMR drives
Software
- Proxmox (both LXC and VM’s)
- Docker CE runtime
- Samba
- NFS
Proxmox / ZFS config
Proxmox itself is a fairly standard installation. The changes made were:
- Since I’m using posixacl, I set ZFS
xattr
property tosa
. This will result in the POSIX ACL being stored more efficiently on disk.zfs set xattr=sa rpool
- Set the
atime
proprty tooff
on the entire datapool. Access time updates cause useless IOPS.zfs set atime=off rpool
- Reduce the ZFS kernel module parameter
spa_asize_inflation
from the default 24 to 6. This impacts the calculations ZFS does to avoid overrunning quota, but causes performance issues when using ‘smallish’ datasets (my LXC containers have 10GB). Discussion on Github issue 10373echo 6 > /sys/module/zfs/parameters/spa_asize_inflation
- Set the
redundant_metadata
property tomost
on the datasets where it doesn’t matter.When set to most, ZFS stores an extra copy of most types of metadata. This can improve performance of random writes, because less metadata must be written. In practice, at worst about 100 blocks (of recordsize bytes each) of user data can be lost if a single on-disk block is corrupt. The exact behavior of which metadata blocks are stored redundantly may change in future releases.
zfs set redundant_metadata=most rpool/...
- Activate the powersave CPU scaling governor. Saving money is a good thing.
echo "powersave" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- Defining new storage endpoints in
/etc/pve/storage.cfg
for backups and images - Configuring the main bridge
vmbr0
to be vlan aware, and defining a newvmbr0.13
interface to get the specific VLAN on the machine running Proxmox.auto vmbr0 vmbr0.13 iface vmbr0.13 inet static address 192.168.0.2 netmask 255.255.255.0 gateway 192.168.0.1 iface vmbr0 inet manual bridge_vlan_aware yes bridge_stp off bridge_fd 0 bridge_ports enp0s31f6 iface enp0s31f6 inet manual
Linux Containers
I’m running a few Linux Containers (LXC), which are hosting
- Unifi Network Management Controller (on a different VLAN)
- General purpose container (for some tools)
Virtual Machines
At the moment only one VM is running, which hosts my WireGuard endpoint. Reason why this runs in a VM is because it requires kernel modules, and I don’t want to add anything extra to my main Proxmox installation.
Docker configuration
Docker has been configured to use the zfs storage driver, which makes better use of the capabilities of ZFS. Configuring this is as easy as adding
{
"storage-driver": "zfs"
}
to /etc/docker/daemon.json
.
For networking I’m using both a user-defined bridge, which has several advantages over using the default bridge (automatic DNS resolution between containers for one), and macvlan to assign external (from Docker’s point of view) IP’s to docker containers.
The macvlan setup is simple:
docker network create \
--driver macvlan \
--gateway 192.168.0.1 \
--subnet 192.168.0.0/24 \
--ip-range 192.168.0.128/25 \
--opt parent=vmbr0.13 \
macvlan-lan
For the actual containers, I’m running a bunch of them:
- Watchtower to keep containers automatically up-to-date (dockerhub)
- Nginx as a reverse proxy in front of the containers - automatically supplying an internal hostname. This also removes the requirement to expose the container ports on the host itself (dockerhub)
- Heimdall as an application dashboard (dockerhub)
- Nextcloud for internal file sharing and collaboration (dockerhub)
- MariaDB for Nextcloud (dockerhub)
- BackupPC for agentless backups (dockerhub)
- Plex for streaming to devices (dockerhub)
- Portainer for easy Docker Container management (dockerhub)
- Grafana for fancy dashboards and graphs (dockerhub)
- InfluxDB as a datastore for Grafana (for data coming from Home Assistant) (dockerhub)
- Duplicati for compressed de-duplicated and encrypted backup to Backblaze B2 (dockerhub)
- PiHole for network-wide ad blocking (dockerhub)
I’m actually running a few - one per VLAN that is used for internet access.
Comments