2 minute read

I’ve noticed that quite a few of my VM workloads and NFS workloads are rather slow on my Proxmox box, due to the facts that

  1. it’s sitting on spinning rust (also known as hard disk drives)
  2. a lot of those are synchronous writes

Synchronous writes are writes where the application asks ZFS to flush the write out to disk, before returning. This way you can be sure that they will have hit the disk in case of a powerfailure. (In comparison, with asynchronous writes ZFS will return as soon as it’s been written to the in-memory buffers, and then flushed out at a later time to disk.)

There’s are plenty good writeups to find on the net, but I can recommend ServeTheHome, and an even better one over at Jim Salter’s page.

To solve the problem of slow sync writes, you can implement what is known as a SLOG - Secondary Log device. This device will store the data to be written temporarily, give the ‘all ok’ to the application, and then write out the data to disk in batches.

ZFS writes without SLOG:

sync write without slog

ZFS writes with SLOG:

sync write without slog

Typically (always?) a SLOG device will be some sort of flash memory, or Intel Optane.

This SLOG device needs to tick quite a few boxes:

  • needs to be FAST. Faster than your other media
  • needs to have a high write endurance. A lot of writes will happen to it. Consumer SSD’s will be worn really quick
  • needs to be able to deal with a power outage. If power goes out before it’s had a chance to flush it’s buffers, you’re still hosed. This is usually called Power Loss Protection, or PLP
  • needs to be resilience. This means you can’t really settle for one physical device - atleast two, since you need to be able to deal with one failing completely
  • does not need to be huge. A SLOG is typically a few GB’s

As Intel Optane is really out of my budget, I settled on two secondhand Dell
Intel DC S3700 SSD’s. They are enterprise 2.5” SSD’s, with a high write endurance and PLP.

These two SSD’s are added to my ZFS pool in a mirror, so that should one of them die, there’s still the other one in place, and my writes are safe.

Now, you have those fancy SSD’s installed, how do you add a SLOG? I partitioned my SSD’s so that I had an 8GB partition on both, and added them to the pool:

# zpool add datapool log mirror /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_SSD1100FGN-part1 /dev/disk/by-id/ata-INTEL_SSDSC2BA100G3R_SSD2100FGN-part1 

This command should return quickly. You can check the status of the SLOG using zpool status -v:

# zpool status -v
  pool: datapool
 state: ONLINE

        NAME                                                   STATE     READ WRITE CKSUM
        datapool                                               ONLINE       0     0     0
          mirror-1                                             ONLINE       0     0     0
            ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part1  ONLINE       0     0     0
            ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part1  ONLINE       0     0     0

errors: No known data errors

And you can mirror the usage with zpool iostat -v 1

# zpool iostat -v 1
                                                         capacity     operations     bandwidth 
pool                                                   alloc   free   read  write   read  write
-----------------------------------------------------  -----  -----  -----  -----  -----  -----
datapool                                               12.4T  11.2T     42    196  3.00M  4.12M
logs                                                       -      -      -      -      -      -
  mirror                                               2.44M  7.50G      0    115      0  1.14M
    ata-INTEL_SSDSC2BA100G3R_BTTV335209Y0100FGN-part1      -      -      0     57      0   585K
    ata-INTEL_SSDSC2BA100G3R_BTTV3343004X100FGN-part1      -      -      0     57      0   585K
-----------------------------------------------------  -----  -----  -----  -----  -----  -----