Replacing Tailscale with a NixOS Module

October 28, 2022

After using Tailscale to manage my Wireguard VPN for the past few months I decided to migrate my network off of it to my own custom configuration. I found that I didn’t need any of the extra features that Tailscale enables, and I enjoyed learning how to manage everything myself.

With NixOS, I was able to recreate the ability to do Endpoint migration to make things a little nicer than a normal barebones setup. My NixOS module lets me configure more than 1 Endpoint per peer and will migrate the connections between them as needed. I only really use this to migrate to/from my Private LAN.

It should go without saying, but Tailscale is (probably) free for you to use. You should probably use that if you don’t want to mess with network settings.

What is Wireguard? Link to heading

Wireguard is a barebones VPN protocol that allows you to create secure connections between machines. It uses a public/private keypair setup similar to ssh, requiring that all the peers you connect to have the public keys of their peers. Each node in the network needs to be configured with a few different things (this is not an exhaustive list, but are the most common options):

  1. An IP address
  2. A Public/Private keypair
  3. A port to use for wireguard connections
  4. A list of Peers that can connect, each containing:
    1. The peers Public Key
    2. A list of allowed ips that peer is able to send as
    3. Optionally, an Endpoint address. An IP address/port combination where the peer is accessible over the public network.

You can also specify an address for DNS and PostUp/PostDown scripts for making any routing changes when the network goes Up/Down.

Wireguard is relatively simple to configure for a static network, requiring only a few pieces of information and not much else. The downside is that it is complicated to make any updates. Changes may have to be mirrored across all devices in the network.

Discovery can also be a problem. The peer must be discoverable using the defined Endpoint for the peer, otherwise the connection may not be available. Typically that means that all of the devices you want to connect to need to have a stable IP:port where they can be reached.

Alternately, you can route from Peer to Peer by using a server that both are connected to. So long as clients can connect to the same Wireguard server, they can route to each other if necessary by going through the server, though the server and the clients need to be configured correctly to make that happen.

The other downside is that only 1 Endpoint can be defined for each peer. The Endpoint is where Wireguard will look for the peer on the public internet to make a connection. For most home users, a peer will often have a public Endpoint (your home IP address + the port you’ve forwarded from your router) and a private Endpoint (the local IP address on the lan), and maybe more. Wireguard has no built-in mechanism for checking multiple Endpoints, or migrating between them.

What is Tailscale? Link to heading

Tailscale manages all the Wireguard problems for you. It handles traversing NATs so that all your peers are reachable, it manages sharing of the public keys among all your devices, and it has a number of nice ease-of-use features built-in to the network (like DNS).

For most people, and especially for large deployments, Tailscale just works. There is even a third-party OSS implementation of their control server, Headscale, if you’d rather self-host and get most of the features.

The vast majority of people should just install Tailscale and be done with it.

For my personal use case though I had a few issues.

  1. It doesn’t play nicely with my “VPN” of choice, Mullvad, even though it probably could.
  2. You must use a third-party identity provider to login, so it makes my VPN dependent on my Google account.
  3. The VPN setup becomes dependent on a Tailscale server to work. I don’t want to be unable to connect to my VPN if Tailscale is down.
  4. I like having bespoke IP addresses which you can’t get with Tailscale.

Manually Configuring Wireguard Link to heading

Because of my concerns with using Tailscale and my desire to make things more complicated than necessary, I migrated my VPN over to a custom setup using a mix of manual management and a NixOS module.

This is only reasonable for me to do because:

  1. I have a small, static network of only a handful of servers and clients.
  2. My home network has a static IP address, and I can configure port forwarding easily.
  3. I don’t have a need to share access to one of these servers with other people, certainly not someone who would also be using Tailscale.
  4. Effectively, I only connect client -> server and never client -> client, which simplifies the setup even more.
  5. Most of my devices run NixOS, so I can turn this from a manual management problem into a software engineering problem.

This has been done and written about before. Most notably, Xe’s post My Automagic NixOS Wireguard Setup. They have since migrated into using Tailscale, since their network became a bit more complex and they wanted some of the niceties that Tailscale offers.

By moving the setup into a NixOS module, I can update the network by making changes in one place, and then pushing the update to all my NixOS machines. I do have to manually manage the config for my NAS, which is not on NixOS, but updating 2 different places turns out to be fairly manageable.

Instead of just configuring Wireguard, this module also sets up a systemd service that will migrate connections to my Private LAN. A few of my Wireguard servers, my NAS and my Desktop, sit on my local network, and I want clients to use either the Public IP:Port associated with them, or use their Private IP:Port when on the local network. This is mostly pointless, since I can still connect to them with the Public IP:Port from my local network, but since I stream a lot of data between those devices it’s better to avoid a hop to/from my ISP that would be required if I routed the packets to my Public IP.

I do still occasionally have some issues with running both Mullvad and this private network together, particularly when trying to SSH between machines outside of the local network. Unlike when I was using Tailscale, restarting the connection has always fixed the problem which I suspect is something going wrong with the iptables routing.

NixOS Module Link to heading

Here’s the code for my NixOS module. It is heavily commented, and to use it you should copy this into your config and customize it for your own network.

{ config, lib, pkgs, ... }:

# Wgnet network
# This contains the network config for my local LAN running over wireguard

# 123.45.67.89 - Fake Home IP address
# 10.41.0.0/16 - LAN addresses with static mapping
# 10.42.0.0/16 - Addresses in the Virtual Wireguard Network

# Adding a new device:
# MANUAL: Create wireguard key(s)
#  Each peer needs to have a public/private keypair created. That can be done with
#  the following commands run as root:
#   > umask 077
#   > wg genkey | tee /var/keys/wgnet.priv.key | wg pubkey > /var/keys/wgnet.pub.key
#   > chmod 0644 /var/keys/wgnet.pub.key
# MANUAL: Setup port forwarding on router for 'server' devices to accept incoming connections

with lib;
let
  cfg = config.modules.services.wgnet;
  homeIP = "123.45.67.89";

  # Define the manual network map based on the host names.
  address = {
    desktop = "10.42.1.1";
    laptop = "10.42.1.2";
    vps = "10.42.1.3";
    nas = "10.42.1.4";
  };

  # Define all the peers in the network
  # Each peer requires
  #  hostname: The hostname, mapped in the hosts file
  #  altHostname: Optional, also mapped in the hosts file)
  #  wgnetAddress: IP to use in the VPN
  #  publicKey: Wireguard public key
  #  publicEndpoint: Optional, publicly addressable location
  #  privateEndpoint: Optional, private address that's not always routable
  devices = [
    {
      hostname = "desktop";
      wgnetAddress = "10.42.1.1";
      publicEndpoint = {
        ip = homeIP;
        port = "54444"; # Forwarded via router
      };
      privateEndpoint = {
        ip = "10.41.1.1";
        port = "53333";
      };
      publicKey = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    }
    {
      hostname = "laptop";
      wgnetAddress = "10.42.1.2";
      privateEndpoint = {
        ip = "10.41.1.2";
        port = "53333";
      };
      publicKey = "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB";
    }
    { # Virtual Private Server on AWS
      hostname = "vps";
      wgnetAddress = "10.42.1.3";
      publicEndpoint = {
        ip = "98.76.65.43";
        port = "53333";
      };
      publicKey = "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";
    }
    { # Home NAS in my local network
      hostname = "nas";
      wgnetAddress = "10.42.1.4";
      publicEndpoint = {
        ip = homeIP;
        port = "54445"; # Forwarded via router
      };
      privateEndpoint = {
        ip = "10.41.1.4";
        port = "53333";
      };
      publicKey = "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";
    }
  ];
in {
  options.modules.services.wgnet = {
    enable = mkEnableOption "wgnet domain";

    address = mkOption {
      type = types.str;
      example = "10.42.1.1";
      default = address.${config.networking.hostName};
      description = "Address of device on the private network.";
    };

    migrate = mkOption {
      type = types.bool;
      default = true;
      description =
        "Run a background service to migrate public -> private endpoints.";
    };
  };

  config = mkIf cfg.enable {
    networking = {
      # Map the hostnames for easy addressing, both as "hostname" and "hostname.wgnet"
      hosts = listToAttrs (map (x: {
        name = x.wgnetAddress;
        value = [ "${x.hostname}" "${x.hostname}.wgnet" ]
          ++ (if (x ? altHostname) then [
            "${x.altHostname}"
            "${x.altHostname}.wgnet"
          ] else
            [ ]);
      }) devices);
      firewall = {
        allowedUDPPorts = [ 53333 ];
        # Force this to false because wg-quick and mullvad don't play nice with each other.
        checkReversePath = lib.mkForce false;
        # Trust all traffic on this interface
        trustedInterfaces = [ "wgnet" ];
      };
      wg-quick.interfaces.wgnet = {
        privateKeyFile = "/var/keys/wgnet.priv.key";
        address = [ "${cfg.address}/16" ];
        listenPort = 53333;

        # There is no problem with having a peer device listed for itself, it is ignored.
        # Endpoints are a hint as to were to find the device, but connections can be accepted from anywhere.

        # Maps the devices list to configured Wireguard peers.
        # It adds an endpoint if that device has a public endpoint, which
        # should be the case for all servers.
        peers = map (x: {
          inherit (x) publicKey;
          allowedIPs = [ "${x.wgnetAddress}/32" ];
          endpoint = if (x ? publicEndpoint) then
            "${x.publicEndpoint.ip}:${x.publicEndpoint.port}"
          else
            null;
        }) devices;
      };
    };

    # This script runs manually to move wireguard connections to the LAN if applicable.
    # This is necessary because we want the roaming devices to know how to connect back using the homeIP, but also to use to prefer the private IPs.
    # Wireguard doesn't support listing multiple endpoints for devices, which would be a better fix.
    systemd.services.wgnet-to-lan = mkIf cfg.migrate {
      description = "Attempt to migrate wireguard connections to LAN.";
      wantedBy = [ "wg-quick-wgnet.service" ];
      # Run every 5 minutes
      startAt = [ "*:0/5" ];

      # Check all peers with both a private and public endpoint and attempt to migrate them.
      # Script resets the endpoint iff
      #   1. We can see the device locally and through the wireguard tunnel
      #   2. The wireguard tunnel doesn't already see it locally
      script = let
        # Setup shorthand links for readability
        wg = "${pkgs.wireguard-tools}/bin/wg";
        grep = "${pkgs.gnugrep}/bin/grep";
        ping = "${pkgs.unixtools.ping}/bin/ping";
      in concatStringsSep "\n" (map (x:
        if (x.hostname != "${config.networking.hostName}" && x ? publicEndpoint
          && x ? privateEndpoint) then ''
            if ! ${wg} show wgnet endpoints | ${grep} -q ${x.privateEndpoint.ip}; then
              if ${ping} -c1 ${x.privateEndpoint.ip} && ${ping} -c1 ${x.wgnetAddress}; then
                echo "Attempting to reset peer ${x.hostname} endpoint to ${x.privateEndpoint.ip}"
                ${wg} set wgnet peer ${x.publicKey} endpoint ${x.privateEndpoint.ip}:${x.privateEndpoint.port}
                if ! ${ping} -c1 ${x.wgnetAddress}; then
                  echo "Failed to reset peer ${x.hostname} to ${x.privateEndpoint.ip}. Reverting to ${x.publicEndpoint.ip}."
                  ${wg} set wgnet peer ${x.publicKey} endpoint ${x.publicEndpoint.ip}:${x.publicEndpoint.port}
                fi
              fi
            fi
          '' else
          "") devices);
    };
  };
}

Other Thoughts Link to heading

Having worked with this quite a bit, I think there is space for an alternative piece of VPN management software. Rather than using a control server like Tailscale I think it would make sense to use something akin to Syncthing. Syncthing, a P2P folder sync service, solves many of the same problems that Wireguard also needs to fix; device discovery and network traversal. I doubt there’s enough demand to justify the work needed to write such a thing, but I’d probably use it if it existed.