Content popularity

How to tune content popularity parameters and use it in routing

ESB3024 Router allows routing decisions based on content popularity. All incoming content requests are tracked to continuously update a content popularity ranking list. The popularity ranking algorithm is designed to let popular content quickly rise to the top while unpopular content decays and sinks towards the bottom.

Configuration

All configuration parameters for content popularity reside in the settings object of the configuration, an example of which can be seen below:

{
  "settings": {
    "content_popularity": {
      "algorithm": "scored_based",
      "session_group_names": ["vod_only"],
      "score_based:": {
        "requests_between_popularity_decay": 1000,
        "popularity_list_max_size": 100000,
        "popularity_prediction_factor": 2.5,
        "popularity_decay_fraction": 0.2
      },
      "time_based": {
        "intervals_per_hour": 10
      }
    }
  }
}

The field algorithm dictates which content popularity tracking algorithm to use, can either be score_based or time_based.

The field session_group_names defines the sessions for which content popularity should be tracked. In the example above, session belonging to the vod_only session group will be tracked for content popularity. If left empty, content popularity will be tracked for all sessions.

The remaining configuration parameters are algorithm specific.

Score based algorithm

The field popularity_list_max_size defines the maximum amount of unique contents to track for popularity. This can be used to limit memory growth. A single entry in the popularity ranking list will at most consume 180 B of memory, giving an upper bound memory growth of \(180n_{\text{requests}}\) bytes. E.g. using "popularity_list_max_size": 1000 would consume at most 180⋅1,000 = 180,000 B = 0.18 MB. If the content popularity list is full, a request to unique content would replace the least popular content.

Setting a very high max size won’t impact performance, it will only consume more memory.

The field requests_between_popularity_decay defines the number of requests between each popularity decay update, an integral component of this feature.

The fields popularity_prediction_factor and popularity_decay_fraction tune the behaviour of the content popularity ranking algorithm, explained further below.

Decay update

To allow for popular content to quickly rise in popularity and unpopular content to sink, a dynamic popularity ranking algorithm is used. The goal of the algorithm is to track content popularity in real time, allowing routing decisions based on the requested content’s popularity. The algorithm is applied every decay update.

The algorithm uses current trending content to predict content popularity. The field popularity_prediction_factor regulates how much the algorithm should rely on predicted popularity. A high prediction factor allows rising content to quickly rise to high popularity but can also cause unpopular content with a sudden burst of requests to wrongfully rise to the top. A low prediction factor can cause stagnation in the popularity ranking, not allowing new popular content to rise to the top.

Unpopular content decays in popularity, the magnitude of which is regulated by popularity_decay_fraction. A high value will aggressively decay content popularity every decay update while a low value will bloat the ranking, causing stagnation. Once content decays to a trivially low popularity score, it is pruned from the content popularity list.

When configuring these tuning parameters, the most crucial data to consider is the size of your asset catalog, i.e. the number of unique contents you offer. The recommended values, obtained through testing, are presented in the table below. Note that the field popularity_prediction_factor is the principal factor in controlling the algorithm’s behaviour.

Catalog size \(n\)popularity_prediction_factorpopularity_decay_fraction
\(n\) < 10002.20.2
1000 < \(n\) < 50002.30.2
5000 < \(n\) < 100002.50.2
\(n\) > 100002.60.2

Time based algorithm

The time based algorithm only requires the configuration parameter intervals_per_hour. E.g., the value "intervals_per_hour": 10 would give 10 six minute intervals per hour. During each interval, all unique content requests has an associated counter, increasing by one for each incoming request. After an hour, all intervals have been cycled through. The counters in the first interval will be reset and all incoming content requests will increase the counters in the first interval again. This cycle continues forever.

When determining a single content’s popularity, the sum of each content’s counter in all intervals is used to determine a popularity ranking.

Usage in routing

Content popularity ranking is available in Lua through the simple member session.content_global_popularity, returning the requested content’s popularity ranking. E.g., the routing configuration below will route all requests to content with a popularity ranking lower than 10, i.e. top 10 most popular content, to the host edge-streamer while requests to content outside of the top 10 most popular will be routed to the host offload.

"routing": {
  "id": "routing_table",
  "member_order": "sequential",
  "members": [
    {
      "id": "edge",
      "weight_function": "return session.content_global_popularity < 11 and 1 or 0",
      "host_id": "edge-streamer"
    },
    {
      "id": "offload",
      "weight_function": "return 1",
      "host_id": "offload"
    }
  ]
}