Tag-Based IPsec VPN Failover
Authors: Mitchell Gulledge, Raul Ricano and Chris Weber
This document describes the benefits and uses of Tagged Based VPN Failover. This document will serve as a reference for the optimal architecture to allow our customers to receive the most benefit of this technology.
Overview
Tagged Based VPN Failover is utilized for third party Data Center Failover and OTT SD WAN Integration. This is accomplished by utilizing the API at each branch or Data Center. Each MX appliance will utilize IPsec VPN with cloud VPN nodes. IPsec along with the API is utilized to facilitate the dynamic tag allocation.

A typical VPN topology for enterprise routing can be seen above. In this use case, the design is providing DC-DC Failover for branch(spoke) sites. In this scenario, if there is a failure on any of the monitored IPs (IPsec peers) there will be an immediate, secure and reliable failover. In order for DC-DC Failover to be achieved, the following behavior must occur:
-
Spoke sites will form a VPN tunnel to the primary DC
- dual active VPN tunnels to both DC’s is not possible with IPSEC given that interesting traffic is often needed to bring up an ipsec tunnel and that interesting traffic will be routed to the first tunnel/peer configured and never the second
- Each spoke will be configured with a tracked IP of its primary DC under the traffic shaping page
- If the tracked IP experiences loss in the last 5 minutes, the API script (below) will re-tag the network in order to swap to the secondary ipsec VPN tunnel
- Once the tracked IP has not had any loss in the last 5 minutes, the tags will be swapped back to swap back to the primary DC (to avoid flapping)
Sample API Solution
The following code is one sample python implementation of this solution. The following will describe how this works.
Prerequisites
Add your API key and org ID to the code in the bolded sections (api_key and url) of the code.
Topology
Dashboard Configuration
Tracked IP's
Navigate to
Security & SD-WAN
>
Traffic
Shaping
and add the IP of the primary peer under the uplink statistics. The MX will start
sending ICMP requests to this IP to track reachability. This data can be viewed on the
Security & SD-WAN
>
Appliance Status
>
Uplink
page
and can be obtained via the API
Network Tags
Naviate to Organization > Overview . Select the network you wish to tag and add one tag for each IPSEC peer. Tags should be in the format:
As an example, if my primary VPN endpoint is London and backup is Paris my tags would be:
london_primary_up (default state for primary is up)
paris_backup_down (default state for the backup is down)
The script below will change the up/down state of these tags when loss is detected on the primary peer
(tracked per the section above).
Site to Site VPN
Navigate to Security & SD-WAN > Site-to-Site VPN and add a peer for the primary and one for the secondary. Each will have the same private subnets but do not cause an overlapping conflict because each will be tagged to a different network with the availability selector. Tag each peer with its corresponding tag configured in the section above.
import requests, json, time
api_key = ''
url = 'https://api.meraki.com/api/v0/organizations//uplinksLossAndLatency'
header = {"X-Cisco-Meraki-API-Key": api_key, "Content-Type": "application/json"}
networkDownList = []
while True:
response = requests.get(url,headers=header)
for network in response.json():
if network['ip'] != '8.8.8.8' and network['uplink']!="wan1":
print(network['networkId'])
print(network['ip'])
loss=False
for iteration in network['timeSeries']:
if iteration['lossPercent'] >= 30:
loss=True
network_info = requests.get("https://api.meraki.com/api/v0/networks/"+network['networkId'], headers=header)
print(network_info.json()['name'])
tags = network_info.json()['tags'].split(' ')
if "_primary_down" in tags[1] or "_primary_down" in tags[2]:
print("VPN already swapped")
break
else:
print("Need to change VPN, recent loss - "+str(iteration['lossPercent']))
if "_primary_up" in tags[1]:
tags[1] = tags[1].split("_up")[0]+"_down"
if "_primary_up" in tags[2]:
tags[2] = tags[2].split("_up")[0]+"_down"
if "_backup_down" in tags[1]:
tags[1] = tags[1].split("_down")[0]+"_up"
if "_backup_down" in tags[2]:
tags[2] = tags[2].split("_down")[0]+"_up"
payload = {'tags': tags[2]+" "+tags[1]}
new_network_info = requests.put("https://api.meraki.com/api/v0/networks/"+network['networkId'], data=json.dumps(payload), headers=header)
networkDownList.append(network['networkId'])
break
if loss==False and network['networkId'] in networkDownList:
print("Primary VPN healthy again..swapping back")
network_info = requests.get("https://api.meraki.com/api/v0/networks/"+network['networkId'], headers=header)
tags = network_info.json()['tags'].split(' ')
if "_primary_down" in tags[1]:
tags[1] = tags[1].split("_down")[0]+"_up"
if "_primary_down" in tags[2]:
tags[2] = tags[2].split("_down")[0]+"_up"
if "_backup_up" in tags[1]:
tags[1] = tags[1].split("_up")[0]+"_down"
if "_backup_up" in tags[2]:
tags[2] = tags[2].split("_up")[0]+"_down"
payload = {'tags': tags[1]+" "+tags[2]}
new_network_info = requests.put("https://api.meraki.com/api/v0/networks/"+network['networkId'], data=json.dumps(payload), headers=header)
networkDownList.remove(network['networkId'])
print(networkDownList)
print("Sleeping for 30s...")
time.sleep(30)
Sample Output
N_573083052582988629 <--Network we are tracking
192.168.128.201 <--Primary VPN hub we are tracking
SD-WAN Hub <-- Network Name
Need to change VPN, recent loss - 41.7 <--Packet loss of 41.7% detected. Script above set to failover on 30% loss
Sleeping for 30s... <--continues to repeat process every 30s (adjustable in script)
N_573083052582988629
192.168.128.201
SD-WAN Hub
VPN already swapped
Sleeping for 30s...
N_573083052582988629
192.168.128.201
SD-WAN Hub
VPN already swapped
Sleeping for 30s...
.
...Repeats until 5 minutes of 0% loss
.
Sleeping for 30s...
N_573083052582988629
192.168.128.201
Primary VPN healthy again..swapping back <--Hasn't been any packet loss on the tracked IP for 5 minutes. Swap back
Sleeping for 30s...
Tags Before Failover
Tags During Failover