Setup for Redundancy

Configuring Redundancy

The Agile Director is designed to support the coexistence of multiple independent instances, facilitating either high-availability and/or horizontal scaling based on specific use cases and requirements. Each instance of the Director operates as a complete isolated entity, offering multiple solutions for redundancy. The choice between these solutions depends on the particular use case and requirements of the deployment.

A few select use-cases and example deployments are described in the sections that follow. Note however that this list is not exhaustive, and other approaches not listed here may be more applicable to a specific use-case.

Third-party load balancer

In this first scenario, multiple instances of the router are positioned behind an off-the-shelf load balancer. This could range from a simple

Nginx

proxy to a more sophisticated commercial load balancer, depending on the user’s specific needs and requirements.

DNS Round Robin

A straightforward alternative to deploying multiple instances of the router behind a load balancer is to employ round-robin DNS. In this approach, each router instance is associated with a distinct A record in DNS, all sharing the same domain name. Clients querying this DNS server receive multiple entries in response. Typically, clients utilize a round-robin algorithm to resolve the IP address of the router, although some clients might randomly select an address from the available servers.

While this method effectively distributes the load across multiple instances, it has a few drawbacks. First, clients are not guaranteed to use any specific algorithm for selecting the IP address. Second, due to DNS caching in the network and on the client’s device, modifications to the DNS record (e.g., taking a node out of service or adding additional nodes) will experience higher latency compared to some other techniques. Third, there is a dependency on the client being able to recognize this DNS server as authoritative. And fourth, this approach does not provide any means of high-availability. If a node is offline, requests to that node will fail.

VRRP

If horizontal scaling is not required, employing a shared Virtual IP Address through the VRRP protocol may suffice. This can be easily configured on one or more instances of the router by installing and setting up keepalived. While the specific details of keepalived are beyond the scope of this document, it can establish a Primary/Backup node configuration, where heartbeat packets and server priority determine each node’s role. Using VRRP, a Virtual IP address always points to the current primary node, and in the event of a primary node failure, the remaining nodes elect a new primary node based on the predetermined priority.

This solution has some drawbacks compared to others; the VIP may only be assigned to a single node at a time, rendering horizontal scaling impossible using this solution alone. Another drawback is that if the primary node fails, there will be a brief period, usually around one second, before the failure is detected and the VIP is reassigned.

This solution can be more effectively utilized as part of a larger combined solution in which several groups of nodes are employed. In each group, there is a Primary and Backup, and selection between the groups is handled via one of the other scaling solutions.

An Example Combined Approach Scenario

The following represents an example scenario where eight independent router nodes are utilized employing all three of the above approaches.

The scenario consists of four pairs of nodes, with each pair using VRRP to designate one node as Master and one as a Backup. In front of two pairs of pairs would sit two load balancers, each configured with the VIP address of each pair below. The two load balancers would then be placed into Round Robin DNS with the same FQDN.

Incoming requests would resolve the DNS record, yielding two servers. They would then send requests to each in turn. The load balancer at the end of those IP addresses would then select among the two VIP addresses below, and the request would be routed to the current Primary node in the group.

In this scenario, if any one of the primary nodes becomes offline, the VRRP protocol will move the VIP to the backup node. While that process is ongoing, the load balancer will detect that the VIP is unreachable, routing all traffic in the meantime to the VIP of the second pair. The DNS round-robin would be used to evenly distribute the traffic between the two load balancers.