How we implemented traffic routing in Meshnet for increased security
Povilas Nagrockas
June 18, 2024
Table of contents
Traffic routing is a significant feature provided by NordVPN’s Meshnet. Essentially, it allows any Meshnet-connected device, called a node, to become a VPN server for other nodes. To do this, the end user needs only to click a few buttons in the NordVPN app. Under the hood, however, we need to handle setup complexities in the code and ensure compatibility with platforms not inherently designed to operate as VPN servers.
How a classical VPN server works
First, we should understand how a classical VPN server operates. Meshnet uses the NordLynx protocol, which is based on WireGuard® – a simple, fast VPN that uses state-of-the-art cryptography. For this article, we’ll refer to WireGuard (wg) in our examples and graphics.
A standard configuration would look like this:
A standard VPN configuration.
To arrive at this setup, a couple of things need to happen.
First, let’s establish a secure tunnel (purple dotted connection):
Virtual network interfaces, labeled as wgC and wgS, (which work like tun adapters) are created on both client and server sides.
The client uses a UDP socket to establish a cryptographic session with the server’s address at 192.0.2.1:51820 (in the diagram above, subnet 192.0.2.0/24 stands for wide area network).
Private IP addresses (100.64.0.2, 100.64.0.1) are assigned to the client and server respectively.
At this point, the client can ping the server using the IP address 100.64.0.1, and the server can ping the client at 100.64.0.2. All IP packets sent through the wgX interface are encrypted and sent via the global internet. The real path of the packet is something like this: wgC –(encapsulate)–> lanC –> lanR –> netR –> netS –(decapsulate)–> wgS
But to the OS, the wgX interface is just another network connection to where IP traffic can be routed, similar to a LAN router.
To the OS, a virtual interface is just like any other network connection.
Now if the client wants to conceal its real IP address, it can configure the routing table to direct all default traffic through the wgA interface (some precautions are needed to avoid routing the encrypted traffic itself, but that’s out of the scope of this article).
Meanwhile, the VPN server needs to be configured to function like a router, accepting incoming packets and forwarding them to their next destination. For this, two features are required:
IP forwarding
In most network stack implementations, if a packet arrives on a network interface, it can only be sent out on the same interface. So when the server receives a packet from the wgS interface that’s directed to an IP address outside the network’s subnet, it is dropped.
Enabling IP forwarding changes this behavior. Now, when a packet arrives at a network interface, it is checked against the network’s entire routing table. If another network interface provides a better match, the packet is forwarded to that interface.
Packet path on the server would look like this: … -> wgS –(ip_fowarding)–> netS -> …
NATing
IP packets arriving at the VPN server will have a private IP address like 100.64.0.2, assigned to the wgC interface. In most cases, these packets will be directed to a publicly routable IP address. After the packet gets forwarded to the netC interface, it still can’t be sent out, because its source address falls within the private network range. The router uplink only deals with public IP addresses and wouldn’t know which device is sending the packet.
As such, NAT (network address translation) is used. For every packet that has a unique source IP, port, and in some cases destination, a unique mapping is created in the NATing table.
For example, if a TCP packet comes from 100.64.0.2:AAAA, it would be mapped to a 192.0.2.1:BBBB address (here AAAA is the port used by software on the client device, and BBBB is a randomly assigned unused port on the server).
The TCP’s packet’s source IP and port would then be exchanged for NAT mapped values, checksum adjusted, and finally sent out on its merry way to the wider internet.
If another computer responds to this BBBB port, the NATing table is consulted and destination IP and port values revert to the original values before the packet is sent to the wgC interface.
And that’s all for a very rudimentary setup!
Supportable platforms
The main challenge with these two requirements is that they limit the number of devices that can function as routers (apart from implementing a user space transport layer multiplexing/demultiplexing logic).
Typically, if we want to set up IP forwarding and NAT, we need root/administrator permissions. Most platforms with strong sandboxing like macOS App Store, iOS, and Android do not provide official APIs to enable this.
That leaves 3 “platforms” we do support:
Linux
Linux is the easiest one of the bunch because it has everything we need already built in, and our NordVPN service, running as root, can set everything up.
macOS Sideload
Unlike the App Store version (which I count as a separate platform), with macOS Sideload applications it’s possible to create launchd services that run with root permissions. This unlocks features that Darwin (the core Unix operating system of macOS) inherits from BSD like ip_forwarding and pf (packet filter), which are used to set up NATing and filtering.
Windows
Setting up IP forwarding is as trivial as a registry modification. However, even if Windows has an official NAT, we found it difficult to use during testing. It does not properly work with Windows Home editions. Being primarily designed for use with Hyper-V, a lot of undefined behaviors crop up when working with our custom adapter drivers. To work around this, we built and shipped our own implementation for NAT.
How Meshnet traffic routing works
Now that we know how a regular VPN server looks and works, we can compare it to how it operates in Meshnet:
A Meshnet VPN configuration.
The first interesting difference to observe is that, unlike a VPN server, in general, both Meshnet devices will be located in their local area networks.
And without Meshnet’s NAT traversal capabilities, turning a device into a VPN server for easy connection by other devices would be challenging.
The second difference is that your dedicated VPN server will usually have not one, but two NATing steps.
The client's (device A) source IP is changed to the server's IP (device B).
The server's IP (device B) is then changed to the router's IP.
This unlocks some interesting behavior: If device A is your phone, and device B is your home PC, routing through B makes it appear to your network that your phone is actually your PC. This allows you to securely access your home network without needing API services hosted on a public server.
And if you use a service that only allows access from your home network, it becomes impossible to tell whether the network messages are coming directly from your home PC or a device routing through it.
At this point, if you are even slightly inclined towards security, some alarm bells may be ringing.
Security considerations
Traffic routing is a very powerful feature:
You can take over a local network.
The device functioning as a VPN server can inspect all traffic going through it.
Other devices can essentially mimic your device.
As such, we want the user to have as much control as possible, so a couple of flags exist to be set on each device per connection.
Allow traffic routing: Specifies if a device can route its traffic through the device acting as a VPN server at all
Allow local network access: Specifies if the device can communicate with other devices in the server’s local area network
Generally, when using this feature, we want to avoid behaviors that might not be obvious at first glance.
A great example of this is a security issue we found and mitigated during development:
Traffic routing could cause unexpected security issues.
Let’s say we have two Meshnet accounts: Mesh X with devices A and B, and Mesh Y with device C. Device C has an external connection to device B, which allows traffic routing for C.
Without any additional network rules, when C is routing through B and pings A’s private IP, it would actually reach A, even though they are not configured to be directly connected. It does not even require NAT to work in this case.
So without any explicit user input, device B has unintentionally exposed device A to device C.
To prevent this, we ensure that all such packets are dropped by B. The only way for C to reach A is to send a Meshnet invite and form a direct connection, making this relationship explicit.
In short, traffic routing is a relatively simple technical solution that unlocks many interesting capabilities in the Meshnet network.