Content popularity
ESB3024 Router allows routing decisions based on content popularity. All incoming content requests are tracked to continuously update a content popularity ranking list. The popularity ranking algorithm is designed to let popular content quickly rise to the top while unpopular content decays and sinks towards the bottom.
Routing
A content popularity based routing rule can be created by running
$ confcli services.routing.rules -w
Running wizard for resource 'rules'
Hint: Hitting return will set a value to its default.
Enter '?' to receive the help string
rules : [
rule can be one of
1: allow
2: consistentHashing
3: contentPopularity
4: deny
5: firstMatch
6: random
7: rawGroup
8: rawHost
9: split
10: weighted
Choose element index or name: contentPopularity
Adding a 'contentPopularity' element
rule : {
name (default: ): content_popularity_rule
type (default: contentPopularity):
contentPopularityCutoff (default: 10): 5
onPopular (default: ): edge-streamer
onUnpopular (default: ): offload
}
Add another 'rule' element to array 'rules'? [y/N]: n
]
Generated config:
{
"rules": [
{
"name": "content_popularity_rule",
"type": "contentPopularity",
"contentPopularityCutoff": 5.0,
"onPopular": "edge-streamer",
"onUnpopular": "offload"
}
]
}
Merge and apply the config? [y/n]: y
This rule will route requests for top 5 most popular content to
edge-streamer
and all other requests to offload
.
Some configuration settings attributed to content popularity are available:
$ confcli services.routing.settings.contentPopularity
{
"contentPopularity": {
"enabled": true,
"algorithm": "score_based",
"sessionGroupNames": []
}
}
enabled
: Whether or not to track content popularity. Whenenabled
is set tofalse
, content popularity will not be tracked. Note that routing on content popularity is possible even ifenabled
isfalse
and content popularity has been tracked previously.algorithm
: Choice of content popularity tracking algorithm. There are two possible choices:score_based
ortime_based
(detailed below).sessionGroupNames
: Names of the session groups for which content popularity should be tracked. Note that content popularity is tracked globally, not per session group.
Algorithm tuning
The behaviour of each content popularity tracking algorithm can be tuned using the raw JSON API.
All configuration parameters for content popularity reside in the
settings
object of the configuration, an example of which can be
seen below:
{
"settings": {
"content_popularity": {
"algorithm": "scored_based",
"session_group_names": ["vod_only"],
"score_based:": {
"requests_between_popularity_decay": 1000,
"popularity_list_max_size": 100000,
"popularity_prediction_factor": 2.5,
"popularity_decay_fraction": 0.2
},
"time_based": {
"intervals_per_hour": 10
}
}
}
}
The field algorithm
dictates which content popularity tracking
algorithm to use, can either be score_based
or time_based
.
The field session_group_names
defines the sessions for which content
popularity should be tracked. In the example above, session belonging to
the vod_only
session group will be tracked for content popularity.
If left empty, content popularity will be tracked for all sessions.
The remaining configuration parameters are algorithm specific.
Score based algorithm
The field popularity_list_max_size
defines
the maximum amount of unique contents to track for popularity. This can
be used to limit memory growth. A single entry in the popularity ranking
list will at most consume 180 bytes of memory. E.g. using
"popularity_list_max_size": 1000
would consume at most
180⋅1,000 = 180,000 B = 0.18 MB. If the content popularity list is full,
a request to unique content would replace the least popular content.
Setting a very high max size will not impact performance, it will only consume more memory.
The field requests_between_popularity_decay
defines the number of requests
between each popularity decay update, an integral component of this feature.
The fields popularity_prediction_factor
and popularity_decay_fraction
tune
the behaviour of the content popularity ranking algorithm, explained further
below.
Decay update
To allow for popular content to quickly rise in popularity and unpopular content to sink, a dynamic popularity ranking algorithm is used. The goal of the algorithm is to track content popularity in real time, allowing routing decisions based on the requested content’s popularity. The algorithm is applied every decay update.
The algorithm uses current trending content to predict content popularity. The
field popularity_prediction_factor
regulates how much the algorithm should rely
on predicted popularity. A high prediction factor allows rising content to quickly
rise to high popularity but can also cause unpopular content with a sudden burst
of requests to wrongfully rise to the top. A low prediction factor can cause
stagnation in the popularity ranking, not allowing new popular content to rise
to the top.
Unpopular content decays in popularity, the magnitude of which is regulated by
popularity_decay_fraction
. A high value will aggressively decay content popularity
every decay update while a low value will bloat the ranking, causing stagnation.
Once content decays to a trivially low popularity score, it is pruned from the
content popularity list.
When configuring these tuning parameters, the most crucial data to consider is
the size of your asset catalog, i.e. the number of unique contents you offer.
The recommended values, obtained through testing, are presented in the table below.
Note that the field popularity_prediction_factor
is the principal factor in
controlling the algorithm’s behaviour.
Catalog size n | popularity_prediction_factor | popularity_decay_fraction |
---|---|---|
n < 1000 | 2.2 | 0.2 |
1000 < n < 5000 | 2.3 | 0.2 |
5000 < n < 10000 | 2.5 | 0.2 |
n > 10000 | 2.6 | 0.2 |
Time based algorithm
The time based algorithm only requires the configuration parameter
intervals_per_hour
. E.g., the value "intervals_per_hour": 10
would give 10 six minute intervals per hour. During each interval,
all unique content requests has an associated counter, increasing
by one for each incoming request. After an hour, all intervals have
been cycled through. The counters in the first interval will be reset
and all incoming content requests will increase the counters in the
first interval again. This cycle continues forever.
When determining a single content’s popularity, the sum of each content’s counter in all intervals is used to determine a popularity ranking.