Setup for Redundancy
The Agile Director is designed to support the coexistence of multiple independent instances, facilitating either high-availability and/or horizontal scaling based on specific use cases and requirements. Each instance of the Director operates as a complete isolated entity, offering multiple solutions for redundancy. The choice between these solutions depends on the particular use case and requirements of the deployment.
A few select use-cases and example deployments are described in the sections that follow. Note however that this list is not exhaustive, and other approaches not listed here may be more applicable to a specific use-case.
Third-party load balancer
In this first scenario, multiple instances of the router are positioned behind an off-the-shelf load balancer. This could range from a simple
Nginx
proxy to a more sophisticated commercial load balancer, depending on the user’s specific needs and requirements.
DNS Round Robin
A straightforward alternative to deploying multiple instances of the router behind a load balancer is to employ round-robin DNS. In this approach, each router instance is associated with a distinct A record in DNS, all sharing the same domain name. Clients querying this DNS server receive multiple entries in response. Typically, clients utilize a round-robin algorithm to resolve the IP address of the router, although some clients might randomly select an address from the available servers.
While this method effectively distributes the load across multiple instances, it has a few drawbacks. First, clients are not guaranteed to use any specific algorithm for selecting the IP address. Second, due to DNS caching in the network and on the client’s device, modifications to the DNS record (e.g., taking a node out of service or adding additional nodes) will experience higher latency compared to some other techniques. Third, there is a dependency on the client being able to recognize this DNS server as authoritative. And fourth, this approach does not provide any means of high-availability. If a node is offline, requests to that node will fail.
VRRP
If horizontal scaling is not required, employing a shared Virtual IP Address
through the VRRP protocol may suffice. This can be easily configured on one or
more instances of the router by installing and setting up keepalived
. While the
specific details of keepalived
are beyond the scope of this document, it can
establish a Primary/Backup node configuration, where heartbeat packets and
server priority determine each node’s role. Using VRRP, a Virtual IP address
always points to the current primary node, and in the event of a primary node
failure, the remaining nodes elect a new primary node based on the predetermined
priority.
This solution has some drawbacks compared to others; the VIP may only be assigned to a single node at a time, rendering horizontal scaling impossible using this solution alone. Another drawback is that if the primary node fails, there will be a brief period, usually around one second, before the failure is detected and the VIP is reassigned.
This solution can be more effectively utilized as part of a larger combined solution in which several groups of nodes are employed. In each group, there is a Primary and Backup, and selection between the groups is handled via one of the other scaling solutions.
An Example Combined Approach Scenario
The following represents an example scenario where eight independent router nodes are utilized employing all three of the above approaches.
The scenario consists of four pairs of nodes, with each pair using VRRP to designate one node as Master and one as a Backup. In front of two pairs of pairs would sit two load balancers, each configured with the VIP address of each pair below. The two load balancers would then be placed into Round Robin DNS with the same FQDN.
Incoming requests would resolve the DNS record, yielding two servers. They would then send requests to each in turn. The load balancer at the end of those IP addresses would then select among the two VIP addresses below, and the request would be routed to the current Primary node in the group.
In this scenario, if any one of the primary nodes becomes offline, the VRRP protocol will move the VIP to the backup node. While that process is ongoing, the load balancer will detect that the VIP is unreachable, routing all traffic in the meantime to the VIP of the second pair. The DNS round-robin would be used to evenly distribute the traffic between the two load balancers.