Monitoring within the AgileTV CDN Manager Cluster
As of AgileTV CDN Manager 1.2.0, all nodes that are part of the cluster can be monitored using Telegraf, allowing for the collection of hardware metrics. This enables monitoring of the health and performance of any node within the cluster.
Telegraf Installation and Configuration
In the mounted ISO directory for the AgileTV CDN Manager, you will find a
Telegraf RPM in ./Packages
containing the Telegraf RPM:
$ ls -1 /mnt/manager/Packages/
telegraf-1.34.3-1.x86_64.rpm
...
Use this RPM to install Telegraf on all nodes that should be monitored. This RPM is compatible with RHEL 8 and 9 (CentOS, Oracle Linux, etc.).
Telegraf will be configured to achieve the following:
- Collect hardware metrics regarding CPU usage, memory utilization, and disk usage.
- Send the collected metrics to an instance of the service
acd-telegraf-metrics-database
. This service is installed alongside ESB3024 AgileTV CDN Director. See acd-telegraf-metrics-database for more details.
Assuming that ESB3024 AgileTV CDN Director has been installed on the host
director-host
, replace /etc/telegraf/telegraf.conf
with the following
configuration:
[agent]
interval = "10s"
round_interval = true
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
debug = false
quiet = false
logfile = ""
# Output to acd-telegraf-metrics-database instance
[[outputs.influxdb_v2]]
urls = ["http://director-host:8086"]
# If acd-telegraf-metrics-database is configured to use TLS with a self-signed
# certificate, uncomment the following line.
# insecure_skip_verify = true
# If acd-telegraf-metrics-database is configured to use token authentication,
# uncomment the following line.
# token = "@{secretstore:metrics_auth_token}"
# Secret store for storing sensitive data
[[secretstores.os]]
id = "secretstore"
## CPU Metrics
[[inputs.system]]
fieldinclude = ["load1", "load5", "load15"]
## Memory Metrics
[[inputs.mem]]
fieldinclude = ["used_percent", "total"]
## Disk Metrics
[[inputs.disk]]
mount_points = ["/"]
## Ignore file system types.
ignore_fs = ["tmpfs", "devtmpfs", "overlay"]
## Report only the used and free fields.
fieldinclude = ["used_percent", "total"]
Run systemctl restart telegraf
to apply the changes. To verify that
Telegraf is running, execute systemctl status telegraf
or check the Telegraf logs
with journalctl -fu telegraf
.
Additional input plugins can be added to the Telegraf configuration to collect more metrics. Visit the Telegraf plugin directory for a list of available input plugins.
Metrics Token Authentication
If the acd-telegraf-metrics-database
instance is configured to use
token authentication with secrets, special configuration is required to
access the secret store. See Using Secrets for Request Authorization
for more details on how to acd-telegraf-metrics-database
uses tokens.
the token
field in the [[outputs.influxdb_v2]]
section must be uncommented.
The secret value must be equivalent to acd-telegraf-metrics-database
service’s
secret value. To set the secret value, use the following command:
$ sudo -u telegraf telegraf secrets set secretstore metrics_auth_token
Enter secret value:
Note that the command above must be run as the user telegraf
since the
Telegraf service runs as this user.
This command will prompt you to enter the secret value to be stored in the
secret store secretstore
with the key metrics_auth_token
. Note that the
secret store name and secret key must match the values used in the
[[outputs.influxdb_v2]]
section of the Telegraf configuration. The secret
value must be the same as the one used in acd-telegraf-metrics-database
.
Visualizing the Metrics
If Telegraf is running and configured correctly, the metrics will be sent to the
acd-telegraf-metrics-database
service on the host director-host
. This
service is periodically scraped by the same host’s Prometheus instance, which is
used to visualize the metrics in Grafana. Grafana is accessible at http://director-host:3000
under the metric names:
disk_total
disk_used_percent
mem_total
mem_used_percent
system_load1
system_load15
system_load5