Content popularity

How to tune content popularity parameters and use it in routing

ESB3024 Router can make routing decisions based on content popularity. All incoming content requests are tracked to continuously update a content popularity ranking list. The popularity ranking algorithm is designed to let popular content quickly rise to the top while unpopular content decays and sinks towards the bottom.

Routing

A content popularity based routing rule can be created by running

$ confcli services.routing.rules -w
Running wizard for resource 'rules'

Hint: Hitting return will set a value to its default.
Enter '?' to receive the help string

rules : [
  rule can be one of
    1: allow
    2: consistentHashing
    3: contentPopularity
    4: deny
    5: firstMatch
    6: random
    7: rawGroup
    8: rawHost
    9: split
    10: weighted
  Choose element index or name: contentPopularity
  Adding a 'contentPopularity' element
    rule : {
      name (default: ): content_popularity_rule
      type (default: contentPopularity):
      contentPopularityCutoff (default: 10): 5
      onPopular (default: ): edge-streamer
      onUnpopular (default: ): offload
    }
  Add another 'rule' element to array 'rules'? [y/N]: n
]
Generated config:
{
  "rules": [
    {
      "name": "content_popularity_rule",
      "type": "contentPopularity",
      "contentPopularityCutoff": 5.0,
      "onPopular": "edge-streamer",
      "onUnpopular": "offload"
    }
  ]
}
Merge and apply the config? [y/n]: y

This rule will route requests for the top 5 most popular content items to edge-streamer and all other requests to offload.

Some configuration settings attributed to content popularity are available:

$ confcli services.routing.settings.contentPopularity
{
    "contentPopularity": {
        "enabled": true,
        "algorithm": "score_based",
        "sessionGroupNames": [],
        "popularityListMaxSize": 100000,
        "scoreBased": {
            "popularityDecayFraction": 0.2,
            "popularityPredictionFactor": 2.5,
            "requestsBetweenPopularityDecay": 1000
        },
        "timeBased": {
            "intervalsPerHour": 10
        }
    }
}

enabled: Whether or not to track content popularity. When enabled is set to false, content popularity will not be tracked. Note that routing on content popularity is possible even if enabled is false and content popularity has been tracked previously.
algorithm: Choice of content popularity tracking algorithm. There are two possible choices: score_based or time_based (detailed below).
sessionGroupNames: Names of the session groups for which content popularity should be tracked. If left empty, content popularity will be tracked for all sessions. The content popularity is tracked globally, not per session group, but the popularity metrics is only updated for sessions belonging to these groups.
popularityListMaxSize: The maximum amount of unique content items to track for popularity.
scoreBased: Configuration parameters unique to the score based algorithm.
timeBased: Configuration parameters unique to the time based algorithm.

Size of Popularity List

The size of the popularity list is limited to prevent it growing forever. A single entry in the popularity ranking list will at most consume 180 bytes of memory. E.g. setting the maximum size to 1000 would consume at most 180⋅1,000 = 180,000 B = 0.18 MB. If the content popularity list is full, a request to a new item will replace the least popular item.

Setting a very high maximum size will not impact performance, it will only consume more memory.

Score-Based Algorithm

The requestsBetweenPopularityDecay parameter defines the number of requests between each popularity decay update, an integral component of this feature.

The popularityPredictionFactor and popularityDecayFraction settings tune the behaviour of the content popularity ranking algorithm, explained further below.

Decay Update

To allow for popular content to quickly rise in popularity and unpopular content to sink, a dynamic popularity ranking algorithm is used. The goal of the algorithm is to track content popularity in real time, allowing routing decisions based on the requested content’s popularity. The algorithm is applied every decay update.

The algorithm uses current trending content to predict content popularity. The popularityPredictionFactor setting regulates how much the algorithm should rely on predicted popularity. A high prediction factor allows rising content to quickly rise to high popularity but can also cause unpopular content with a sudden burst of requests to wrongfully rise to the top. A low prediction factor can cause stagnation in the popularity ranking, not allowing new popular content to rise to the top.

Unpopular content decays in popularity, the magnitude of which is regulated by popularityDecayFraction. A high value will aggressively decay content popularity on every decay update while a low value will bloat the ranking, causing stagnation. Once content decays to a trivially low popularity score, it is pruned from the content popularity list.

When configuring these tuning parameters, the most crucial data to consider is the size of your asset catalog, i.e. the number of unique contents you offer. The recommended values, obtained through testing, are presented in the table below. Note that the popularityPredictionFactor setting is the principal factor in controlling the algorithm’s behaviour.

Catalog size n	Popularity prediction factor	Popularity decay fraction
n < 1000	2.2	0.2
1000 < n < 5000	2.3	0.2
5000 < n < 10000	2.5	0.2
n > 10000	2.6	0.2

Time-Based Algorithm

The time based algorithm only requires the configuration parameter intervalsPerHour. As an example, setting intervalsPerHour to 10 would give 10 six minute intervals per hour. During each interval, all unique content requests has an associated counter, increasing by one for each incoming request. After an hour, all intervals have been cycled through. The counters in the first interval will be reset and all incoming content requests will increase the counters in the first interval again. This cycle continues forever.

When determining a single content’s popularity, the sum of each content’s counter in all intervals is used to determine a popularity ranking.