Monitoring within the AgileTV CDN Manager Cluster

How to monitor the health of the nodes within the AgileTV CDN Manager cluster.

As of AgileTV CDN Manager 1.2.0, all nodes that are part of the cluster can be monitored using Telegraf, allowing for the collection of hardware metrics. This enables monitoring of the health and performance of any node within the cluster.

Telegraf Installation and Configuration

In the mounted ISO directory for the AgileTV CDN Manager, you will find a Telegraf RPM in ./Packages containing the Telegraf RPM:

$ ls -1 /mnt/manager/Packages/
telegraf-1.34.3-1.x86_64.rpm
...

Use this RPM to install Telegraf on all nodes that should be monitored. This RPM is compatible with RHEL 8 and 9 (CentOS, Oracle Linux, etc.).

Telegraf will be configured to achieve the following:

  • Collect hardware metrics regarding CPU usage, memory utilization, and disk usage.
  • Send the collected metrics to an instance of the service acd-telegraf-metrics-database. This service is installed alongside ESB3024 AgileTV CDN Director. See acd-telegraf-metrics-database for more details.

Assuming that ESB3024 AgileTV CDN Director has been installed on the host director-host, replace /etc/telegraf/telegraf.conf with the following configuration:

[agent]
  interval = "10s"
  round_interval = true
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = false
  logfile = ""

# Output to acd-telegraf-metrics-database instance
[[outputs.influxdb_v2]]
  urls = ["http://director-host:8086"]

  # If acd-telegraf-metrics-database is configured to use TLS with a self-signed
  # certificate, uncomment the following line.
  # insecure_skip_verify = true

  # If acd-telegraf-metrics-database is configured to use token authentication,
  # uncomment the following line.
  # token = "@{secretstore:metrics_auth_token}"

# Secret store for storing sensitive data
[[secretstores.os]]
  id = "secretstore"

## CPU Metrics
[[inputs.system]]
  fieldinclude = ["load1", "load5", "load15"]

## Memory Metrics
[[inputs.mem]]
  fieldinclude = ["used_percent", "total"]

## Disk Metrics
[[inputs.disk]]
  mount_points = ["/"]
  ## Ignore file system types.
  ignore_fs = ["tmpfs", "devtmpfs", "overlay"]
  ## Report only the used and free fields.
  fieldinclude = ["used_percent", "total"]

Run systemctl restart telegraf to apply the changes. To verify that Telegraf is running, execute systemctl status telegraf or check the Telegraf logs with journalctl -fu telegraf.

Additional input plugins can be added to the Telegraf configuration to collect more metrics. Visit the Telegraf plugin directory for a list of available input plugins.

Metrics Token Authentication

If the acd-telegraf-metrics-database instance is configured to use token authentication with secrets, special configuration is required to access the secret store. See Using Secrets for Request Authorization for more details on how to acd-telegraf-metrics-database uses tokens.

the token field in the [[outputs.influxdb_v2]] section must be uncommented. The secret value must be equivalent to acd-telegraf-metrics-database service’s secret value. To set the secret value, use the following command:

$ sudo -u telegraf telegraf secrets set secretstore metrics_auth_token
Enter secret value:

Note that the command above must be run as the user telegraf since the Telegraf service runs as this user.

This command will prompt you to enter the secret value to be stored in the secret store secretstore with the key metrics_auth_token. Note that the secret store name and secret key must match the values used in the [[outputs.influxdb_v2]] section of the Telegraf configuration. The secret value must be the same as the one used in acd-telegraf-metrics-database.

Visualizing the Metrics

If Telegraf is running and configured correctly, the metrics will be sent to the acd-telegraf-metrics-database service on the host director-host. This service is periodically scraped by the same host’s Prometheus instance, which is used to visualize the metrics in Grafana. Grafana is accessible at http://director-host:3000 under the metric names:

disk_total
disk_used_percent
mem_total
mem_used_percent
system_load1
system_load15
system_load5