Routing Engine

How the routing engine works and how to configure it

The central component of ESB3024 Router is the routing engine. It is what analyses an incoming video request and determines which CDN, if any, to pass it on to based on a variety of factors such as geographic location, content type and current server load.

Like most of the router, the routing engine is configured through the /v2/configuration API, see Configuration for general configuration information. The basic structure looks something like this:

{
  // ...
  "cdns": [
    {
      "id": "cdn1",
      "http_port": 80,
      "https_port": 443,
      "manifest_availability_check": {
        "enabled": false,
        "session_group_ids": []
      },
      "redirecting": false
    }
  ],
  "hosts": [
    {
      "id": "host1"
      "cdn_id": "cdn1",
      "host": "host1.example.com",
      "ipv6_address": "fc00:ed6e:0:201:0:aff:fe10:306b"
    },
    {
      "id": "host2"
      "cdn_id": "cdn1",
      "host": "host2.example.com",
    }
  ],
  // ...
  "routing": {
    "id": "routing_table",
    "member_order": "weighted",
    "members": [
      {
        "id": "node1",
        "host_id": "host1",
        "weight_function": "return 100"
      },
      {
        "id": "node2",
        "host_id": "host2",
        "weight_function": "return 100"
      }
    ],
    "weight_function": "return 1"
  },
  // ...
}

In the above example the routing engine configuration is contained in the "routing" field of the JSON structure. The "cdns" and "hosts" fields are included to provide a context and show where the "host_id" values come from.

Tree components

The "routing" part of the configuration is a tree of routing nodes. These nodes can be either of two types: A leaf node or a branch node. Leaf nodes contain references to CDN hosts while branch nodes contain a list of member nodes but do not reference any hosts itself.

All nodes, regardless of type, have a unique ID as well as a weight function. The ID is used to identify the node in e.g. rule evaluation metrics and the weight function is used to dynamically determine if a node should be evaluated for the incoming request.

Let us go through the two structures one by one and look at their fields and possible values.

Branch node

Branch nodes do not refer to any hosts by themselves, but contain member nodes that may be traversed in order to find a suitable host.

{
  "id": "branch_node_example",
  "members": [],
  "member_order": "weighted",
  "weight_function": "return 1"
}

"id": The unique identifier for a node. Its value cannot be used on any other node, not even a leaf node.
"weight_function": A Lua function body returning an integer value. Exactly how the return value is used depends on the parent node’s member_order field, but a higher value generally indicates higher likelihood of the node being traversed. The value of 0 or less means that this node should not be used.
"members": A list of child nodes that will be evaluated if this node is selected by the routing engine. How they are evaluated depends on the value of the member_order field.
"member_order: A string determining how this node is to be evaluated. Allowed values are "sequential", "sorted" and "weighted".

Member Order Evaluation

As mentioned above, there are three different member orders and they affect how children of a branch node are traversed.

"sequential" means that all children are evaluated in the order they are listed in the configuration. If a child’s weight function returns a positive value, it will be picked for traversal even if one of its siblings has a higher weight. In order to skip a child its weight function must return 0.

If a child is picked for traversal but none of the leaf nodes in its subtree return a positive weight the routing engine will move on to the next sibling and attempt to find a leaf node there instead. This will continue until a leaf node is found or the entire tree is exhausted. If no valid leaf is found an error will be sent to the client.

"sorted" will cause the route engine to compute the weight of each child node, using each child’s "weight_function" Lua code, sort them from highest to lowest and pick the first node.

If no leaf node within that child’s subtree is accepted, i.e. they all return 0 or less, then the next child will be traversed and so on until a suitable host is found or there are no more subtrees left to evaluate. If no host is found, an error message will be sent to the client.

"weighted" tells the routing engine to evaluate the weight functions for all children, and then pick one of them randomly with a probability equal to the child’s weight divided by the sum of the weights of all immediate children.

Just as with the other two variants, if no suitable leaf node is found within the child’s subtree one of its siblings is picked instead. The same random method is used to pick subsequent children until a leaf node is accepted or the entire tree is exhausted. An error message will be sent to the client if no host is found.

Leaf node

Leaf nodes contain a reference to a host object rather than more child nodes.

{
  "id": "leaf_node_example",
  "host_id": "cdn_host_id",
  "weight_function": "return 100"
}

"id": The unique identifier for a node. Its value cannot be used on any other node, not even a branch node.
"host_id": The ID of one of the hosts in the "hosts" list. Represents the host the client will be routed through if this leaf node is selected.
"weight_function": A Lua function body that returns the weight of this particular node. The return value is used by the parent node to determine which leaf node to select for the current request.

A Note on Weight Functions

While the primary and mandatory purpose of the weight function is to return a value to let the routing engine determine which leaf node is most suitable for a client request, it can do more things. It is, however, vital that a weight function always returns an integer value.

There is a global Lua context for the functions to use to set and read global states, third party software can be used to keep track of external states and values through the selection_input API. It is even possible to set up a dummy node that prints out debug information but always returns 0, in effect having nothing to do with the actual routing.

Due to the global nature of variables in Lua it is important to remember to declare any variables used for temporary calculations as local or risk cross-contamination between client requests. It is generally recommended to avoid using global variables at all in the router unless absolutely certain what the consequences may be.

A Practical Example

Imagine we have a system for a Swedish market, with capacity enough for most of the time, but with rented offload capacity in the third party CDN for peak hour traffic as well as for any clients connecting from outside of Sweden. The private CDN is separated in a host for live content and one for VOD, but the offload CDN has a single host capable of handling anything. The private CDN in this example will also report its remaining capacity in per cent to the selection_input variable capacity_percent and as long as at least 10 % is unused it should accept new requests.

The router configuration will have to set up a few hosts, classifiers for incoming requests and finally a routing tree which we will go through in detail further down:

{
  "cdns": [
    {
      "http_port": 80,
      "https_port": 443,
      "id": "offload",
      "manifest_availability_check": {
        "enabled": false,
        "session_group_ids": []
      },
      "redirecting": false
    },
    {
      "http_port": 80,
      "https_port": 443,
      "id": "cdn",
      "manifest_availability_check": {
        "enabled": false,
        "session_group_ids": []
      },
      "redirecting": false
    }
  ],
  "hosts": [
    {
      "cdn_id": "offload",
      "host": "offload.example",
      "ipv6_address": "fc00:ed6e:0:201:0:aff:fe10:306b",
      "id": "offload"
    },
    {
      "cdn_id": "cdn",
      "host": "vod.cdn.example",
      "id": "cdn_vod"
    },
    {
      "cdn_id": "cdn",
      "host": "live.cdn.example",
      "id": "cdn_live"
    }
  ],
  "session_groups": [
    {
      "id": 1,
      "name": "Not Sweden",
      "classifiers": [
        [
          {
            "id": 1,
            "inverted": true,
            "name": "Not Sweden",
            "rule": {
              "rule_type": "geoip_rule",
              "source": "session/client_ip",
              "country": "Sweden"
            }
          }
        ]
      ]
    },
    {
      "id": 2,
      "name": "IsLive",
      "classifiers": [
        [
          {
            "id": 1,
            "inverted": false,
            "name": "Is live content",
            "rule": {
              "pattern": "*/live/*",
              "rule_type": "string_match_rule",
              "source": "session/content_url_path"
            }
          }
        ]
      ]
    }
  ],
  "routing": {
    "id": "routing_table",
    "member_order": "sequential",
    "members": [
      {
        "id": "Offload if not Sweden",
        "host_id": "offload",
        "weight_function": "local retval = 0; if session_groups[\"Not Sweden\"] \
          then retval = 1; end; return retval"
      },
      {
        "id": "Private CDN",
        "member_order": "sorted",
        "members": [
          {
            "id": "live",
            "host_id": "cdn_live",
            "weight_function": "local retval = 0; if (selection_input.capacity_percent \
              or 0) > 10 and session_groups.IsLive then retval = 1; end; return retval"
          },
          {
            "id": "vod",
            "host_id": "cdn_vod",
            "weight_function": "local retval = 1; if (selection_input.capacity_percent \
              or 0) < 10 or session_groups.IsLive then retval = 0; end; return retval"
          }
        ],
        "weight_function": "return 1"
      },
      {
        "id": "Offload if no match",
        "host_id": "offload",
        "weight_function": "return 1"
      }
    ],
    "weight_function": "return 1"
  }
}

In order to understand this configuration, let us pretend some content is requested from a client:

Client IP: 94.127.35.102
Content path: "/live/news.m3u8"

The first thing that will happen is that the session group classifiers will analyse the request.

The IP is geolocated to Stockholm, Sweden. This means that the classifier named "Not Sweden" will be negative¹, which in turn means that the session group named "Not Sweden" will also be negative.

The content path contains the string "/live/" which causes the classifier named "Is live content" to be positive and the corresponding session group "IsLive" will also become positive as a result.

With the classifiers done, it is time to traverse the routing tree.²

First the root node is evaluated. Its weight function returns a positive value, so its child nodes will be traversed.³ The member order is set to "sequential" so the routing engine goes through them one by one in the order they are written in the configuration and picks the first one to return a positive value.

"Offload if not Sweden" comes first, so its weight function is called. Since the session group "Not Sweden" is false, the weight function will return 0 and the engine moves on to the next sibling.

"Private CDN" has two children and the member order "sorted" meaning that they will both be evaluated, and the one with the highest weight will be picked. Assuming that the CDN capacity is sufficient, the child "live" will return 1 and "vod" will return 0, meaning that the host object with ID "cdn_live" will be selected and the client redirected to live.cdn.example/live/news.m3u8.

However, if the capacity is insufficient, neither child will return a positive value and the engine continues on to "Offload if no match" which always returns 1 and is therefore guaranteed to match and cause any unclaimed requests to end up at offload.example/live/news.m3u8.

The classifier rule actually evaluates positively against IP:s located in Sweden, but since it is marked as inverted, the result is flipped and the whole thing ends up being negative instead. This is much simpler than trying to write rules that match against everything except Sweden. ↩︎
Note that some of the weight functions are too long to fit on a single line. They have been split into multiple lines with a backslash indicating the break point. These backslashes and line breaks must be removed for the configuration to be valid and accepted by the router. ↩︎
This is not strictly necessary. By default any node without an explicit weight function will have "return 100" instead. Explicit functions are used in this example for clarity. ↩︎